Sunteți pe pagina 1din 803

th

45 Annual Conference
of the
International Military
Testing Association

Pensacola, Florida
3-6 November 2003

www.InternationalMTA.org
i

TABLE OF CONTENTS

IMTA 2003 STEERING COMMITTEE ix

STEERING COMMITTEE MEETING MINUTES x

IMTA BY-LAWS (AS AMENDED BY STEERING COMMITTEE) xii

PAPERS

A01. Maliko-Abraham, H., and Lofaro, R.J. USABILITY TESTING: 1


LESSONS LEARNED AND METHODOLOGY

A02. Elliott-Mabey, N.L. WHY WE STILL NEED TO STUDY 6


UNDEFINED CONCEPTS

A03. Annen, H., and Kamer, B. DO WE ASSESS WHAT WE WANT 13


TO ASSESS? THE APPRAISAL DIMENSIONS AT THE
ASSESSMENT CENTER FOR PROFESSIONAL OFFICERS (ACABO)

A04. Gutknecht, S.P. PERSONALITY AS PREDICTOR OF JOB 22


ATTITUDES AND INTENTION TO QUIT

A05. Snooks, S., and Luster, L. USING TASK MODULE DATA TO 30


VALIDATE AIR FORCE SPECIALTY KNOWLEDGE TESTS

A06. Brugger, C. THE SCOPE OF PSYCHOLOGICAL TEST 44


SYSTEMS WITHIN THE AUSTRIAN ARMED FORCES

A.07 Sumer, H.C.; Bilgic, R.; Sumer, N.; and Erol, T. JOB-SPECIFIC 49
PERSONALITY ATTRIBUTES AS PREDICTORS OF
PSYCHOLOGICAL WELL-BEING

A08. Farmer, W.L.; Bearden, R.M.; Eller, E.D.; Michael, P.G.; 62


Johnson, R.S.; Chen, H.; Nayak, A.; Hindelang, R.L.; Whittam, K.;
Watson, S.E.; and Alderton, D.L. JOIN: JOB AND OCCUPATIONAL
INTEREST IN THE NAVY

A09. Temme, L.A.; Still, D.L.; Kolen, J.; and Acromite, M. 70


OZ: A HUMAN-CENTERED COMPUTING COCKPIT DISPLAY

A10. Cian, C.; Carriot, J.; and Raphela, C. PERCEPTUAL DRIFT 91


RELATED TO SPATIAL DISORIENTATION GENERATED BY
MILITARY SYSTEMS : POTENTIAL BENEFITS OF SELECTION
AND TRAINING

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
ii

A11. Krouse, S.L., and Irvine, J.H. PERCEPTUAL DYSLEXIA: ITS 96


EFFECT ON THE MILITARY CADRE AND BENEFITS OF
TREATMENT

A12. Cowan, J.D. NEUROFEEDBACK TRAINING FOR TWO 103


DIMENSIONS OF ATTENTION: CONCENTRATION AND
ALERTNESS

A13. Schaab, B.B., and Dressel, J.D. WHAT TODAY’S SOLDIERS 111
TELL US ABOUT TRAINING FOR THE FUTURE

A14. Burns, J.J.; Giebenrath, J.; and Hession, P. OBJECTIVE-BASED 116


TRAINING: METHODS, TOOLS, AND TECHNOLOGIES

A15. Helm, W.R., and Reid, J.D. RACE AND GENDER AS FACTORS 123
IN FLIGHT TRAINING SUCCESS

A16. Phillips, H.L.; Arnold, R.D.; and Fatolitis, P. VALIDATION 129


OF AN UNMANNED AERIAL VEHICLE OPERATOR SELECTION
SYSTEM

A17. Sabol, M.A.; Schaab, B.B.; Dressel, J.D.; and Rittman, A.L. 140
SUCCESS AT COLLABORATION AS A FUNCTION OF
KNOWLEDGE DEPTH

A20. Janega, J.B., and Olmsted, M.G. U.S. NAVY SAILOR 150
RETENTION: A PROPOSED MODEL OF CONTINUATION
BEHAVIOR

A21. Morath, R.; Cronin, B.; and Heil, M. DOES MILITARY 156
PERSONNEL JOB PERFORMANCE IN A DIGITIZED FUTURE
FORCE REQUIRE CHANGES IN THE ASVAB: A COMPARISON
OF A DYNAMIC/INTERACTIVE COMPUTERIZED TEST BATTERY
WITH THE ASVAB IN PREDICTING TRAINING AND JOB
PERFORMANCE AMONG AIRMEN AND SAILORS

A22. Lappin, B.M.; Klein, R.M.; Howell, L.M.; and Lipari, R.N. 158
COMPARISONS OF SATISFACTION AND RETENTION
MEASURES FROM 1999-2003

A23. Richardson, J. BRITISH ARMY LEAVERS SURVEY: 167


AN INVESTIGATION OF RETENTION FACTORS

A24. Mitchell, D.; Keller-Glaze, H.; Gramlich, A.; and Fallesen, J. 171
PREDICTORS OF U.S. ARMY CAPTAIN RETENTION DECISIONS

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
iii

A25. Huffman, A.H.; Youngcourt, S.S.; and Castro, C.A. THE 177
IMPORTANCE OF A FAMILY-FRIENDLY WORK ENVIRONMENT
FOR INCREASING EMPLOYEE PERFORMANCE AND RETENTION

A26. Harris, R.N.; Mottern, J.A.; White, M.A.; and Alderton, D.L. 199
TRACKING U.S. NAVY RESERVE CAREER DECISIONS

A27. Bowles, S.V. DUTIES AND FUNCTIONS OF A RECRUITING 205


COMMAND PSYCHOLOGIST

B02a. Lancaster, A.R.; Lipari, R.N.; Howell, L.M.; and Klein, R.M. 208
THE 2002 WORKPLACE AND GENDER RELATIONS SURVEY

B02b. Ormerod, A.J., and Wright, C.V. WORKPLACE REPRISALS: 219


A MODEL OF RETALIATION FOLLOWING UNPROFESSIONAL
GENDER-RELATED BEHAVIOR

B02c. Lawson, A.K., and Fitzgerald, L.F. UNDERSTANDING 237


RESPONSES TO SEXUAL HARASSMENT IN THE U.S. MILITARY

B04. Ford, K.A. USING STAKEHOLDER ANALYSIS (SA) AND 252


THE STAKEHOLDER INFORMATION SYSTEM (SIS) IN HUMAN
RESOURCE ANALYSIS

B05. O’Connell, B.J.; Beaubien, J.M.; Keeney, M.J.; and Stetz, T.A. 271
DESIGNING A NEW HR SYSTEM FOR NIMA

B07a. Peck, J.F. PERSONNEL SECURITY INVESTIGATIONS: 278


IMPROVING THE QUALITY OF SUBJECT AND WORKPLACE
INTERVIEWS

B07b. Crawford, K.S., and Wood, S. STRATEGIES FOR INCREASED 283


REPORTING OF SECURITY-RELEVANT BEHAVIOR

B07c. Fischer, L.F. CHARACTERIZING INFORMATION SYSTEMS 289


INSIDER OFFENDERS

B07d. Kramer, L.A.; Heuer, R.J.,Jr.; and Crawford, K.S. TEN 297
TECHNOLOGICAL, SOCIAL, AND ECONOMIC TRENDS THAT ARE
INCREASING U.S. VULNERABILITY TO INSIDER ESPIONAGE

B07e. Wiskoff, M.F. DEVELOPMENT OF A WINDOWS BASED 300


COMPUTER-ADMINISTERED PERSONNEL SECURITY SCREENING
QUESTIONNAIRE

B09. Filjak, T.; Cippico, I.; Debač, N.; Tišlarić, G.; and Zebec, K. 305

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
iv

OCCUPATIONAL ANALYSIS APPLIED FOR THE PURPOSE OF


DEFINING OF SELECTION CRITERIA FOR NEW MILITARY
OCCUPATIONAL SPECIALTIES IN THE ARMED FORCES OF THE
REPUBLIC OF CROATIA

B10a. Lee, W.C., and Drasgow, F. USING DECISION TREE 310


METHODOLOGY TO PREDICT ATTRITION WITH THE AIM

B10b. Chernyshenko, O.S.; Stark, S.E.; and Drasgow, F. PREDICTING 317


ATTRITION OF ARMY RECRUITS USING OPTIMAL
APPROPRIATENESS MEASUREMENT

B10c. Stark, S.E.; Chernyshenko, O.S.; and Drasgow, F. A NEW 323


APPROACH TO CONSTRUCTING AND SCORING FAKE-
RESISTANT PERSONALITY MEASURES

B11. Janega, J.B., and Olmsted, M.G. U.S. NAVY SAILOR 330
RETENTION: A PROPOSED MODEL OF CONTINUATION
BEHAVIOR

B12. Nederhof, F.V.F. PSYCHOMETRIC PROPERTIES OF THE 336


DUTCH SOCIAL SKILLS INVENTORY

B13. Hendriks, B.; van de Ven, C.; and Stam, D. DEPLOYABILITY 344
OF TEAMS: THE DUTCH MORALE QUESTIONNAIRE, AN
INSTRUMENT FOR MEASURING MORALE DURING MILITARY
OPERATIONS.

B14. Cotton, A.J., and Gorney, E. MEASURES OF WELLBEING 358


IN THE AUSTRALIAN DEFENCE FORCE

B15. Lescreve, F.J. WHY ONE SHOULD WAIT BEFORE 336


ALLOCATING APPLICANTS

B16. Schreurs, B. FROM ATTRACTION TO REJECTION: A 381


QUALITATIVE RESEARCH ON APPLICANT WITHDRAWAL

B17. Borman, W.C.; White, L.A.; Bowles, S.; Horgen, K.E.; 398
Kubisiak, U.C.; and Penney, L.M. U.S. ARMY RECRUITER
SELECTION RESEARCH: AN UPDATE

B18. Mylle, J. MODELLING COMMUNICATION IN 404


NEGOTIATION IN PSO CONTEXT

C01. Schultz, K.; Sapp, R.; and Willers, L. ELECTRONIC 412

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
v

ADVANCEMENT EXAMS – TRANSITIONING FROM PAPER-


BASED TO ELECTRONIC FORMAT

C02. Pfenninger, D.T.; Klion, R.E.; and Wenzel, M.U. 418


INTEGRATED WEB ASSESSMENT SOLUTIONS

C03. Oropeza, T.; Hawthorne, J.; Seilhymer, J.; Barrow, D.; 431
and Balog, J. STREAMLINING OF THE NAVY ENLISTED
ADVANCEMENT NOTIFICATION SYSTEM

C04. Edgar, E.; Zarola, A.; Dukalskis, L.; and Weston, K. THE ROLE 438
OF PSYCHOLOGY IN INTERNET SELECTION

C05. O’Connell, B.J.; Caster, C,H.; and Marsh-Ayers, N. WORKING FOR 448
THE UNITED STATES INTELLIGENCE COMMUNITY: DEVELOPING
WWW.INTELLIGENCE.GOV

C06b. Farmer, W.L.; Bearden, R,M.; Borman, W.C.; Hedge, J.W.; 455
Houston, J.S.; Ferstl, K.L.; and Schneider, R.J. ENCAPS – USING
NON-COGNITIVE MEASURES FOR NAVY SELECTION AND
CLASSIFICATION

C06c. Twomey, A., and O'Keefe, D. PILOT SELECTION IN THE 461


AUSTRALIAN DEFENCE FORCE: AUSBAT VALIDATION

C07a. Styer, J.S. DEVELOPMENT AND VALIDATION OF A 468


REVISED ASVAB CEP INTEREST INVENTORY

C07b. Watson, S.E. JOB AND OCCUPATIONAL INTEREST IN 474


THE NAVY

C07c. Farmer, W.L., and Alderton, D.L. VOCATIONAL INTEREST 481


MEASUREMENT IN THE NAVY - JOIN

C07d. Hanson, M.A.; Paullin, C.J.; Bruskiewicz, K.T.; and White, L.A. 485
THE ARMY VOCATIONAL INTEREST CAREER EXAMINATION

C07e. Putka, D.J.; Iddekinge, C.H.: and Sager, C.E. DEVELOPING 491
MEASURES OF OCCUPATIONAL INTERESTS AND VALUES
FOR SELECTION

C08a. Boerstler, R.E., and Kammrath, J.L. OCCUPATIONAL 499


SURVEY SUPPORT OF AIR AND SPACE EXPEDITIONARY
FORCE (AEF) REQUIREMENTS

C08. Jones, P.L.; Strange, J.; and Osburn, H. OCCUPATIONAL 505

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
vi

ANALYTICS

C09a. Heffner, T.S.; Tremble, T.; Campbell, R; and Sager, C. 507


ANTICIPATING THE FUTURE FOR FIRST-TOUR SOLDIERS

C09b. Sager, C.E., and Russell, T.L. FUTURE-ORIENTED JOB 514


ANALYSIS FOR FIRST-TOUR SOLDIERS

C09c. Keenan, P.A.; Katkowski, D.A.; Collins, M.M.; Moriarty, K.O.; 522
and Schantz, L.B. PERFORMANCE CRITERIA FOR THE
SELECT21 PROJECT

C09d. McCloy, R.A.; Putka, D.J.; Van Iddekinge, C.H.; and 531
Kilcullen, R.N. DEVELOPING OPERATIONAL PERSONALITY
ASSESSMENTS: STRATEGIES FOR FORCED-CHOICE AND
BIODATA-BASED MEASURES

C09e. Waugh, G.W., and Russell, T.L. SCORING BOTH JUDGMENT 540
AND PERSONALITY IN A SITUATIONAL JUDGMENT TEST

C09f. Iddekinge, C.H.; Putka, D.J.; and Sager, C.E. ASSESSING 549
PERSON-ENVIRONMENT (P-E) FIT WITH THE FUTURE ARMY

C10. Heffner, T.S.; Campbell, R.; Knapp, D.J.; and Greenston, P. 556
COMPETENCY TESTING FOR THE U.S. ARMY
NONCOMMISSIONED OFFICER (NCO) CORPS

C11. Lane, M.E.; Mottern, J.A.; White, M.A.; Brown, M.E.; and
561
Boyce, E.M. 1ST WATCH: ASSESSMENT OF COPING STRATEGIES
EMPLOYED BY NEW SAILORS

C12. Brown, M.E.; Mottern, J.A.; White, M.A.; Lane, M.E.; and 567
Boyce, E.M. 1st WATCH: THE NAVY FIT SCALE

C13a. Steinberg, A.G., and Nourizadeh, S. USING RESULTS FROM 573


ATTITUDE AND OPINION SURVEYS

C13b. Nourizadeh, S., and Steinberg, A.G. USING SURVEY AND 575
INTERVIEW DATA: AN EXAMPLE

C13c. Rosenfeld, P.; Newell, C.E.; and Braddock, L. UTILIZING 581


SURVEY RESULTS OF THE NAVY EQUAL OPPORTUNITY/
SEXUAL HARASSMENT SURVEY

D01. Waldköetter, R., and Arlington, A.T. THE U.S. ARMY'S 587

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
vii

PERSONNEL REPLACEMENT SYSTEM

D02. Mylle, J. TEAM EFFECTIVENESS AND BOUNDARY 592


MANAGEMENT: THE FOUR ROLES PRECONIZED BY ANCONA
REVISITED

D03. Cotton, A.J., and Gorney, E. MENTAL HEALTH LITERACY IN 599


THE AUSTRALIAN DEFENCE FORCE

D04. Dursun, S., and Morrow, R. DEFENCE ETHICS SURVEY: THE 608
IMPACT OF SITUATIONAL MORAL INTENSITY ON ETHICAL
DECISION MAKING

D05. Thompson, B.R. ADAPTING OCCUPATIONAL ANALYSIS 615


METHODOLOGIES TO ACHIEVE OPTIMAL OCCUPATIONAL
STRUCTURES

D07. Smith, G.A. WHOM AMONG US? PRELIMINARY RESEARCH 620


ON POSITION AND PERSONNEL SELECTION CRITERIA FOR
MALE UAV SENSOR OPERATORS

D09. Lim, B.C., and Ployhart, R.E. TRANSFORMATIONAL 631


LEADERSHIP: RELATIONS TO THE FIVE FACTOR MODEL
AND TEAM PERFORMANCE IN TYPICAL AND MAXIMUM
CONTEXTS

D10. Cronin, B.; Morath, R.; and Smith, J. ARMY LEADERSHIP 654
COMPETENCIES: OLD WINE IN NEW BOTTLES?

D11. Holtzman, A.K.; Baker, D.P.; Calderón, R.F.; Smith-Jentsch, K.; 661
and Radtke, P. DEVELOPING APPROPRIATE METRICS FOR
PROCESS AND OUTCOME MEASURES

D12. Douglas, I. SOFTWARE SUPPORT OF HUMAN 671


PERFORMANCE ANALYSIS

D13. Beaubien, J.M.; Baker, D.P.; and Holtzman, A.K. HOW MILITARY 679
RESEARCH CAN IMPROVE TEAM TRAINING EFFECTIVENESS IN
OTHER HIGH-RISK INDUSTRIES

D14. Costar, D.M.; Baker, D.P.; Holtzman, A.; Smith-Jentsch, K.A.; and 688
Radtke, P. DEVELOPING MEASURES OF HUMAN PERFORMANCE:
AN APPROACH AND INITIAL REACTIONS

D15. Makgati, C.K.M. PSYCHOLOGICAL IMPLICATIONS OF 694


DEPLOYMENTS FOR THE MEMBERS OF THE SOUTH AFRICAN

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
viii

NATIONAL DEFENCE FORCE (S.A.N.D.F.)

D16. Cotton, A.J. THE PSYCHOLOGICAL IMPACT OF 702


DEPLOYMENTS

D18. Brown, K.J. THE LEADERS CALIBRATION SCALE 710

D19. Horey, J.D., and Fallesen, J.J. LEADERSHIP COMPETENCIES: 721


ARE WE ALL SAYING THE SAME THING?

D20. Lett, J.; Thain, J.; Keesling, W.; and Krol, M. NEW 734
DIRECTIONS IN FOREIGN LANGUAGE APTITUDE TESTING

D21. Willis, D. THE STRUCTURE & ANTECEDENTS OF 742


ORGANISATIONAL COMMITMENT IN THE SINGAPORE ARMY.

D22. Tan, C.; Soh, S.; and Lim, B.C. FURTHER UNDERSTANDING OF 750
ATTITUDES TOWARDS NATIONAL DEFENCE AND MILITARY
SERVICE IN SINGAPORE

D23. Bradley, P.; Charbonneau, D.; and Campbell, S. MEASURING 760


MILITARY PROFESSIONALISM

D24. Truhon, S.A. DEOCS: A NEW AND IMPROVED MEOCS 766

D25. Rone, R.S. ADDRESSING PSYCHOLOGICAL STATE 772


AND WORKPLACE BEHAVIORS OF DOWNSIZING SURVIVORS

D26. Devriendt, Y.A., and Levaux, C.A. VALIDATION OF THE 779


BELGIAN MILITARY PILOT SELECTION TEST BATTERY

INDEX OF AUTHORS 784

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
ix

2003 IMTA Executive Steering Committee


Current Members

Col Tony Cotton Australia, Defence Health Service Branch, Australian


Defence Force
Dr. Christian Langer Austria, Establishment Command Structure and Force
Organization
LtCol Francois Lescreve Belgium, Ministry of Defense
Dr. Jacques Mylle Belgium, Royal Military Academy
Ms. Susan Truscott Canada, Department of National Defence
Dr. Corinne Cian France, Center de Recherches du Service de Santa’ des
Armees
Ms. Wiltraud Pilz Germany, Federal Ministry of Defense
Mr. Kian-Chye Ong Singapore, Ministry of Defense
Dr. Henry Widen Sweden
LtCol Frans Matser The Netherlands, Royal Army
(Dr. Renier van Gelooven)
Ms. Jo Richardson United Kingdom, Ministry of Defence (Army)
Dr. James Riedel U.S. Defense Personnel Security Research Center
Dr. Mike Lentz U.S. Navy, NETPDTC, Navy Advancement Center
LtCol. John Gardner U.S. Air Force, Occupational Measurements Squadron
Mr. Kenneth Schwartz U.S. Air Force, Personnel Command
Ms. Mary Norwood U.S. Coast Guard, Occupational Standards

Liasion Members

Dr Mike Rumsey U.S. Army, Research Institute

Potential New Organizational Members (Voted In During 2003 ESC Meeting)

Dr. Ferdinand Rameckers Netherlands, Defence Selection Institute


Dr. Hubert Annen Switzerland, The Military Academy at the Swiss Federal
Institute of Technology

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
x

Minutes

International Military Testing Association (IMTA)

Executive Steering Committee Meeting

3 November 2003

The Steering Committee met for the 45th IMTA at 1530 hours, 4 November 2003 at the
Hilton Garden Inn, Pensacola, Florida. Captain Gary Dye, United States Navy, chaired
the meeting. Steering Committee members in attendance are listed on the attachment.

1. Introductions:

Dr. Lentz welcomed everyone to IMTA 2003. The following countries were
represented: Austria, Royal Netherlands, United States, Australia, Canada,
United Kingdom, Belgium, Switzerland, France, Germany, and Singapore.

2. Conference Administration:

Capt. Gary Dye gave a synopsis of the 45th IMTA. Approximately 200 people
registered to attend, 80 papers were collected, and we received $15,000 in seed
funding from IMTA 2002 held in Ottawa, Canada. The theme for this year’s
conference was “Optimizing Military Performance”. The conference was
expanded this to include a fourth track entitled, Human Performance. IMTA 2003
also added non-commercial exhibits this year to the conference. And one
presenter will be demonstrating a tutorial SkillsNet.

The keynote speaker on Tuesday is Vice Admiral Harms, Commander of United


States Training for the Navy. On Tuesday afternoon tours will be given at the
Naval Aviation Museum, social hour, and the IMAX theatre.

The keynote speaker on Wednesday is Dr. Ford, of Institute for Human and
Machine Cognition. On Wednesday evening will be our IMTA banquet and a live
band.

3. Website Report:

Monty Stanley presented a status report on the IMTA website,


www.internationalmta.org, emphasizing his recent website makeover.

4. Presentations:

Two presentations were made by prospective IMTA members.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
xi

a. Dr. Hubert Annen of Switzerland, MILAK/ETHZ (Military Academy)


presented information in regards to the Training Centre for Officers of the
Swiss Armed forces, military sciences, and basic continuing education of
officers in various fields of study.

b. Dr. Fernidan HJI Rameckers of the Netherlands, Defence Selection


Institute, presented information in regards to the Defence Interservice
Command, Defence Selection Institute, and Psychological Selection.

5. New Inductions:
Dr. Lentz made a motion to accept both countries (Switzerland and The
Netherlands) into the organization. It was seconded by Australia. Motion was
approved by the committee.

6. IMTA bylaws:
Dr. Lentz presented the proposed change, Article V, Section E. of the existing
IMTA bylaws. LtCol Francois Lescreve made a motion to accept the change.
Motion was approved.

Dr. Lentz explained we would need to post the recommended change and the
attending membership would need to vote on the change at the IMTA Banquet.

7. IMTA 2004 will be held in Belgium/Brussels, 26-28 October. LtCol Francois


Lescreve will be our host. Also, a NATO activity workshop on Officer’s
Recruiting and Retention will take place the same week.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
xii

BY-LAWS OF THE INTERNATIONAL MILITARY TESTING ASSOCIATION

Article I - Name

The name of the organization shall be the International Military Testing Association (IMTA).

Article II – Purpose

A. Discuss and exchange of ideas concerning the assessment of military personnel

B. Discuss the mission, organization, operations and research activities of associated


organizations engaged in military personnel assessment.

C. Foster improved personnel assessment through exploration and presentation of new


techniques and procedures for behavioral measurement, occupational analysis, manpower
analysis, simulation modeling, training, selection methodologies, survey and feedback systems.

D. Promote cooperation in the exchange of assessment procedures, techniques and instruments


among IMTA members and with other professional groups or organizations

E. Promote the assessment of military personnel as a scientific adjunct to military personnel


management.

Article III – Participation

The following categories shall constitute the membership within the IMTA:

A. Primary Membership shall be open to personnel assigned to organizations of the armed


services and defense agencies that have been recognized by the IMTA Steering Committee as
Member Organizations and whose primary mission is the assessment of military personnel.
Representatives from the Member Organizations shall constitute the Steering Committee.

B. Associate Membership shall be open to personnel assigned to military, governmental or other


public entities engaged in activities that parallel those of primary membership. Associate
members (including prior members, such as retired military or civilian personnel who remain
professionally active) shall be entitled to all privileges of the primary members with the exception
of membership on the Steering Committee, which may be waived by a majority vote of the
Steering Committee

C. Non-Member Participants represents all other interested organizations or personnel who wish
to participate in the annual conference, present papers or participate in symposium/panel
sessions. Non-Members will not attend the Steering Committee meeting nor have a vote in the
association affairs.
Article IV – Dues

No annual dues shall be levied against the members or participants.

Article V – Steering Committee

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
xiii

A. The governing body of the Association shall be the Steering Committee, which will consist of
representatives from the Primary Members and those other members as voted by a majority of the
Steering Committee. Commanders of the Primary Member organizations will each appoint their
Steering Committee Member.

B. The Steering Committee shall have general supervision over the affairs of the Association
and shall have responsibility for all activities of the Association. The Steering Committee shall
conduct the business of the Association between the annual conferences of the Association by
such means of communications as selected by the Chairman.

C. Meeting of the Steering Committee shall be held in conjunction with the annual conference of
the Association and at such times as requested by the Chairman.
Representation from a majority of the Primary Members shall constitute a quorum.

D. Each member of the Steering Committee shall have one vote toward resolving Steering
Committee deliberations.

E. (Added November 2003) All past recipients of the Harry Greer Award will be ex officio, non-
voting members of the Steering Committee, unless they still represent their organization, in which
case, they would still be a voting member. (The intent here is to maintain the institutional
knowledge, the depth and breadth of experience, and the connection to our history that could be
lost since Executive Steering Committee members are subject to change.

Article VI – Officers

A. The officers of the Association shall consist of the Chairman of the Steering Committee and a
Secretary.

B. The Commander of the Primary Member coordinating the annual conference of the
Association shall select the Chairman of the Steering Committee. The term of the Chairman shall
begin at the close of the annual conference of the Association and shall expire at the close of the
next annual conference. The duties of the Chairman include organizing and coordinating the
annual conference of the Association, administering the activities of the IMTA, and the duties
customary to hosting the annual meeting.

C. The Chairman shall appoint the Secretary of the Association. The term of the Secretary shall
be the same as that of the Chairman. The duties of the Secretary shall be to keep the records of
the Association and the minutes of the Steering Committee, to conduct official correspondence
for the Association and to insure notice for the annual conference. The Secretary shall solicit
nominations for the Harry H. Greer Award.

Article VII – Meetings

A. The association shall hold a conference annually.

B. The Primary Members shall coordinate the annual conference of the Association, either
individually or as a consortium. The order of rotation shall be determined by the Steering
Committee. The coordinating Primary Members and the tentative location of the annual
conference for the following three years shall be announced at each annual conference.

C. The annual conference of the Association shall be held at a time and place determined by the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
xiv

coordinating Primary Member. Announcement of the time and place for the next annual
conference will occur at the annual conference.

D. The coordinating Primary Member shall exercise planning and supervision over the program
and activities of the annual conference. Final selection of program content shall be the
responsibility of the coordinating Primary Member. Proceedings of the annual conference shall
be published by the coordinating Primary Member.

E. Any other organization (other than a Primary Member) may coordinate the annual conference
and should submit a formal request to the Chairman of the Steering Committee no less than 18
months prior to the date they wish to host.

Article VIII – Committees

A. Committees may be established by vote of the Steering Committee. The Chairman of each
committee shall be appointed by the Chairman of the Steering Committee from among the
members of the Steering Committee.

B. Committee members shall be appointed by the Chairman of the Steering Committee in


consultation with the Chairman of the committee being formed. Committee chairman and
members shall serve in their appointed capacities at the discretion of the Chairman of the Steering
Committee. The Chairman of the Steering Committee shall be an ex officio member of all
committees.

C. All committees shall clear their general plans of action and new policies through the Steering
Committee. No committee or committee chairman shall enter into activities or relationships with
persons or organizations outside of the Association that extend beyond the approved general plan
or work specified without the specific authorization of the Steering Committee.

Article IX – Amendments

A. Amendments of these By-Laws may be made at the annual conference of the Association.

B. Proposed amendments shall be submitted to the Steering Committee not less than 60 days
prior to the annual meeting. Those amendments approved by a majority of the Steering
Committee may then be ratified by a majority of the assembled membership. Those proposed
amendments not approved by the Steering Committee may be brought to the assembled
membership for review and shall require a two-thirds vote of the assembled membership to
override the Steering Committee action.

Article X – Voting

All members attending the annual conference shall be voting members

Article XI – Harry H. Greer Award

A. The Harry H. Greer Award signifies long standing exceptional work contributing to the
vision, purpose and aim of the IMTA.

B. Selection Procedures:

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
xv

1. Prior to June 1st of each year, the Secretary will solicit nominations for the Greer Award
from members of the Steering Committee. Prior Greer Award winners may submit
unsolicited nominations. Award nominations shall be submitted in writing to the Secretary
by 1 July.

2. The recipient will be selected by a committee drawn from the Primary Members and
committee members will have attended at least the previous three Association annual
conferences.

3. The Chairman of the Award Committee is responsible for canvassing other committee
members to review award nominations and reach a consensus on the selection of a recipient
of the award prior to the annual conference.

4. The Award Committee selection shall be reviewed by the Steering Committee.

5. No more than one person is to receive the award each year but the Steering Committee
may decide not to select a recipient in any given year.

C. The Award is to be presented during the annual conference. The Award is to be a certificate,
the text prepared by the officers of the Association and appropriate memorabilia per discretion of
the Chairman.

Article XII – Enactment

These By-Laws shall be in force immediately upon acceptance by a majority of the assembled
membership of the Association.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
1

USABILITY TESTING: LESSONS LEARNED AND METHODOLOGY

Helene Maliko-Abraham
Basic Commerce and Industries, Inc.
Helene.ctr.maliko-abraham@faa.gov

Ronald John Lofaro, PhD


Embry Riddle Aeronautical University
lofaror@erau.edu

INTRODUCTION
Operational usability testing is an essential aspect of fielding new systems. The field
of knowledge engineering holds great promise in developing new and effective
methodologies for such tests. When developing a new system, you have to know, understand
and work with people who represent the actual user community. It is the users that determine
when a system is ready and easy to use. The user is commonly referred to as a Subject
Matter Expert (SME). The efforts of SMEs add credibility and accuracy to the Operational
Test (OT) process.
Based on the main author’s recent OT experience, careful consideration must be given
on how to properly utilize the input and expertise of the SME group. An evaluation of
lessons learned from this activity resulted in the realization that the full potential of the SME
contributions were not realized. The major contributing factor was the lack of an appropriate
methodology to effectively focus the efforts of this group.
The Small Group Delphi Paradigm (SGDP) (Lofaro, 1989) could have been that
methodology. The SGDP evolved from the Delphi process which was originally developed
in the 1950s, as an iterative, consensus building process for forecasting futures. The SGDP
took the Delphi process in another direction by modifying it via merger with elements of
group dynamics in order to have interactive (face-to-face) Delphi workshops. The
modification resulted in a paradigm for eliciting evaluations, ratings and analyses from small
groups of experts. The SGDP can be used for any project that requires a set of SMEs be used
to identify, evaluate, and criticality rate tasks. It also has applications to recommend
modifications to equipment, procedures and training. Finally, the SGDP can be used to
sharpen, modify and revise methodologies.
The link between the SGDP and usability testing is that both use SMEs to elicit information.
The information garnered from the SMEs can then be used to create realistic scenarios to
evaluate the operability of the equipment being tested. This paper will discuss how the
SGDP technique can be used to develop scenarios that will be used in operational usability
testing.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
2

OPERATIONAL USABILITY TESTING


Dumas and Redish (1999) define usability as, “…the people who use the product can
do so quickly and easily to accomplish their own tasks. There are four main parts to their
definition, usability means focusing on users; people use products to be productive; users are
busy people trying to accomplish tasks; and the users decide when a product is easy to use”.
The FAA’s Acquisition Management System (AMS) Test and Evaluation (TE) policy
guidelines require that selected test processes should include a verification of operational
readiness. It defines two distinct components of operational readiness, operational
effectiveness and operational suitability. “Operational suitability is the degree to which a
system can be used satisfactorily in the field with consideration given to availability,
maintainability, safety, human factors, logistics, supportability, documentation, and training”.
For the purpose of this paper, the authors will be concentrating on operational suitability,
which is synonymous with usability.

SMALL GROUP MODIFIED DELPHI


Delphi techniques have become common methodologies for eliciting analyses.
Standard Delphi techniques include anonymity of response, multiple iterations, convergence
of the distribution of answers, and a statistical group response (Judd, 1972).
However, as Meister (1985) has said, “ the Delphi methodology is by no means fixed…[it] is
still evolving and being researched.” The SGDP is seen as another step in his evolution.
In the development of the SGDP technique, Fleischmann’s underlying abilities theory
was merged with traditional Delphi techniques as well as group dynamics. The SGDP has
been successfully used to knowledge engineer SME data for the development of core
competencies, selection tests, training, analyses, objective and model development. The
SGDP is a highly structured, sequentially ordered, process. This technique has been used in
many environments, which demonstrates a robust flexibility and generalizability of the
paradigm. This flexibility and generalizability are borne out as the SGDP has been used,
with modifications resulting from initial conditions, multiple times (Lofaro and Intano 1990,
Gibb and Lofaro 1991, Gibb and Garland 1994, Lofaro, 1998). Every use of the basic SGDP
model results in modifications and takes “shape” as the objectives are defined, the SMEs are
selected and time limits are set.

THE STRUCTURE
After carefully selecting the SMEs who will provide expert data, opinions and
criticality ratings, the first step is to develop the objectives. Each objective must then be
enumerated, and sub-objectives can then be developed and all the components needed to
achieve the whole objective will be in place. This becomes the basis for developing and
scheduling the times/types of group sessions.
A read-ahead package must be prepared and sent to the Workshop participants at least three
weeks in advance. This package is vital. The SGDP will flow, or not, smoothly from it. The
package must contain, besides the pro forma where and when, the following:
A. The objectives of the workshop, including the longer-range goals that the
workshop will enable. This is key to obtaining SME buy-in and maximum cooperation.
B. A clear statement describing not only what the participants will be doing, but also
informing them that their input will be operationally implemented. In this package, the SMEs
should be advised that they were hand picked for their acumen and experience and that they
are the final "word" on this. All of this is true, but the clarity, transparency and use-value, in

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
3

the operational environment and, of both the immediate and long-range goals are what will
ensure maximal SME effort.
C. Participants should be advised that small-group processes will be used and that
group dynamics training will be the first step in the process. Materials on "groupthink",
group forming etc. should also be included in this package. At the first session, some group
exercises should be conducted to demonstrate what the materials have described.
D. Finally, the read-ahead package should include a homework assignment.
Participants should be instructed to read all of the read-ahead package and then begin
thinking and working on the first objective. A day-by-day agenda should be provided as
well.
NOTE: Along with the read-ahead, the facilitator must prepare a complete set of protocols
for the participants to be given to them when they arrive. Besides having the read -ahead
materials, the protocols also have the group processes materials and a step-by-step breakout
of the objectives, sequencing and, how each objective will be carried out. The protocols are
the structure to keep the groups on the process track.

THE SEQUENTIAL PROCESS


The optimum size of the workshop is 10 persons broken into 2 sub-groups of 5. Upon
arrival each SMEs should be provided with a final agenda and the protocols that specify how
each objective will be accomplished. The next step is to proceed with instructions and
exercises in group dynamics and consensus. Work on the objectives can now begin. An
iterative step-wise process should be used where each objective is accomplished by
anonymous and individual means. Sub-group discussions are then held to achieve consensus.
Finally, the intact (the 2 sub groups meet together) for a group discussion to achieve final
consensus. The iterative, step-wise process it to go from a 5 person group meeting on sub-
objectives to an intact 10 person meeting to make a group decision on the main objective.

DISCUSSION
One obvious candidate for the SGDP methodology is Operational Testing (OT). The
goal of any OT is to determine the system’s capability to perform its mission in the
operational setting and to determine and evaluate core operational effectiveness and
suitability problems. The SGDP can be modified so that SMEs can use it in OT.
A small set of carefully selected SMEs would be used to face-validate the existing scenarios.
This SME set also would ensure that all the necessary operational issues were embedded in
the scenarios. Dumas and Redish state, “a good scenario is short, composed in the user’s
words (not the products), and is clear so all participants will understand it”. A second group
of SMEs would also assist in the test sequencing as well as the techniques/scalings to be used
in workload analysis. While the process of multiple SGDP workshops may seem lengthy, it
is not the case. The Workshops would run consecutively, each new one beginning with the
data from the prior one---and each SGDP workshop only would need approximately 4 days to
complete. The experience and insight as to the realities of the operational arena of the SMEs
are both needed and invaluable. Upon completion of the actual OT, all SME members of the
2 previous SGDPs would be convened to interpret the results and to make a set of
recommendations. These recommendations may include retrofit, modification, and training.
The two SGDP groups would work independently at first. Then, as they finished, they would
convene into one group to finalize their results.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
4

A major consideration is the impact of using face-to-face groups on ratings and


evaluations. Pill (1970) has said that this may dilute the opinions of the real expert. This
seems a strange objection as the SME’s selected are the real experts. However, if what is
meant is that one of the groups may have more expertise in a small, specific area that is being
worked on, then the reality (based on conducting 7 or so of these) is that the other SMEs
recognize, welcome and use that expertise –as their goal is the best product possible.
Another objection is that the group dynamics may force ratings and analyses towards
a mean or middle ground that does not fully reflect all the SMEs views. There are 2 answers
to this: the first is that “real” SMEs will not allow that to happen for personal reasons and
pride. They will not “go along to get along.” The second is that the instruction in group
work, the facilitator and an iterative methodology used in accomplishing the sub-objectives
are all structures in the SGDP process designed to ensure that this does not happen.

CONCLUSION
While the use of the SGDP technique has been suggested for building operability
testing scenarios and the evaluative criteria for them, it seems that its use may be extended.
In developing the content for any usability test, to include requirements, procedures,
scenarios and evaluation, a variant of the SGDP can be used. As stated previously, the goal
of conducting any operability testing on a system is to ascertain the operational readiness of
the system, i.e., the operational effectiveness and the operational suitability. The
identification and refinement of system-critical operational requirements are eminently suited
to being accomplished via the SGDP.

REFERENCES
Dumas, J.S., Redish, J.C., 1999, “A Practical Guide to Usability Testing”. Intellect: Portland,
OR.

Gibb, G.M., Lofaro, R.J., et al., 1991,“The Development of Enhanced Screening Techniques
for the selection of Air Traffic Controllers” Proceedings of the annual Air Traffic Controller
Association (ATCA) Symposium September 1991

Judd, R.C. (1972). “Use of Delphi Methods in higher Education”, Technological Forecasting
and Social change, 4, 176-196

Lofaro, R.J., 1998, “Identifying and Developing Managerial and Technical Competencies:
Differing Methods and Differing Results” (Proceedings of 42nd annual Human Factors and
Ergonomics Society.

Lofaro, R.J., Gibb, G.M., and Garland, D., 1994, “A Protocol for Selecting Airline Passenger
Baggage Screeners”) DOT/FAA/CT-94/110. National Technical Information Service:
Springfield, VA.

Lofaro, R.J. and Intano, G.P., 1989. “Exploratory Research and Development: Army Aviator
Candidate Classification By Specific Helicopter” Proceedings of 5th International
Symposium of Aviation Psychologists, R.S. Jensen (ed.).

Meister, D (1985), Behavioral Analysis and Measurement Methods. New York: Wiley .

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
5

Pill, J. (1970) “The Delphi Method: Substance, Context:; Critique and an Annotated
Bibliography”. Technical Memorandum 183, Case Western Reserve University, Cleveland ,
OH.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
6

WHY WE STILL NEED TO STUDY UNDEFINED CONCEPTS

Nicola L Elliott-Mabey
Deputy Directorate Policy (Research and Management Information),
Room F107a, HQ PTC, RAF Innsworth
Gloucester, GL3 1EZ
trgres1.cos@ptc.raf.mod.uk

INTRODUCTION
The Deputy Directorate Policy (Research & Management Information) (DDP(Res &
MI)) consists of 8 psychologists and a psychology student, and is situated at the UK’s Royal
Air Force’s (RAF) Headquarters Personnel and Training Command, RAF Innsworth,
Gloucester. DDP(Res & MI) provide and co-ordinate appropriate applied psychological
research in support of current and future RAF personnel and training policies. The core of the
work programme relates in particular to the areas of ethos/culture, recruitment, training,
retention, community support, diversity and equality. As part of its applied research studies
programme, a number of surveys are conducted which quantify attitudes to a wide variety of
factors, ranging from satisfaction with pay, to importance of promotion prospects. However,
more nebulous terms and concepts such as ‘morale’, ‘quality of life’, ‘ethos’, ‘stability’, and
‘overstretch’ are also regularly assessed/measured; yet these concepts do not lend themselves
well to agreed academic definition neither at a conceptual nor operational level.

FOCUS
This paper will focus upon why it is important to study undefined or poorly defined
concepts such as ethos and morale. The paper will discuss the importance of definition and
language to the discipline of psychology and consequences for the measurement of attitudes.
It will then consider why certain terms fail to be conventionally defined but which are
significant to the study of the military because of their common usage and growing
organisational interest.

IMPORTANCE DEFINITION AND MEASUREMENT


Importance of definition and language
There is no doubt that psychology has its own language. There are terms which are
particular to psychology for example ‘ego-centric’ and ‘self-actualisation’; but also terms
which have both common and psychological definitions such as ‘extraversion’, ‘preference’
and ‘motivation’. Such a technical vocabulary requires definition for comprehensibility and
common understanding, and to ensure that the terms are used in a specific and consistent
manner. A definition is likely to include the elements which comprise the concept but also
reference to the stability of these elements, eg that job satisfaction may vary over time.

A definition should refer to which elements are included but also what is excluded. An
important distinction is not necessarily whether the definition is in some abstract sense
‘correct’ but “whether it is useful or not” (Liefooghe, Jonsson, Conway, Morgan, and Dewe,
2003, p28). A clear definition is therefore judged by many (eg McKenna, 1994) to be the
necessary precursor for measurement. Researchers need to be certain what they are trying to
quantify in order to employ an existing measure or construct a new one. A clear definition
also sets a baseline and assists replication and broadening of research in the future.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
7

Types of definition
There are different types of definition available to psychologists. They may use a
descriptive/conventional definition which is universally agreed ie a dictionary definition, for
example “brain: the part of the central nervous system which is encased by the skull” (Reber,
1985, p101). Alternatively, a stipulative/working definition may be more appropriate where
the researcher indicates how they will use the term; acknowledging ambiguity over the
meaning, or that several meanings exist and the most relevant is chosen.

Definitions can also be conceptual or operational. Conceptual definitions are


concerned with defining what a given construct means, for instance satisfaction is “an
emotional state produced by achieving some goal” (Reber, 1985, p660). An operational
definition, on the other hand, needs to be understood in relation to a given context, in which
the term is applicable, and will make reference to how the attribute is to be measured, for
instance intelligence is “that which intelligence tests test” (Bell, Staines and Mitchell, 2001,
p114).

Measurement
There is a strong school of thought which advocates that only terms which can be
defined can be measured (eg Schwab, 1999). The reasoning behind this is to ensure precision
of meaning thus avoiding ambiguity of results. As well as the propensity in psychology for
definition of terms there is a natural tendency towards measurement. Through measurement
constructs are made researchable.

Attitudes are hypothetical constructs representing individuals’ tendencies and so


cannot be measured directly. As such, only inferences can be made about the nature of
attitudes by measuring the behaviour believed to stem from given attitudes or asking
individuals to report their feelings, thoughts and opinions. Nevertheless many different
measurement tools have been developed. Attitudes are measured because, although they are
not directly observable, they facilitate or hinder activity, that is they can be the underlying
cause for an action.

Any measure must seek to be valid and reliable. Reliability relates to the consistency
and stability of the tool, and is necessary for validity to exist. For an instrument to be valid it
must measures what it purports to measure (Cannell and Kahn, 1968). In relation to validity
is it paramount that an instrument measures the construct “fully and exclusively” (Young,
1996, p1).

Definition of attitudes
Whilst psychology has a desire to categorise, define and measure variables, as
practitioners engaged in ‘real world’ research we are acutely aware of how difficult this is.
Motivation, job satisfaction, team, culture and stress are examples of occupational psychology
terms which have been defined but in many different ways. Some would might even call
them ‘definitional elusive’ (Reber, 1995, p454) and which have ‘resisted clarification’ (Reber,
1995, p101).

Researchers and practitioners alike recognise the difficulties of definition, and


therefore measurement, but even so have attempted to construct working definitions and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
8

measurement tools. Often these are relevant to a given context or perspective; take for
instance the following two definitions of leadership:
“The process by which an agent induces a subordinate to behave in a desired manner”
(Bennis, 1959)
“The process of influencing an organised group towards accomplishing its goals” (Roach and Behling,
1984)

The first definition emphasises a leader is someone with a subordinate(s), where as the
second focuses on the process of leadership ie to attain a goal through the influence of a group
member

It is important to note that definitions vary due to a whole range of dependencies and
influences such as context, theoretical perspective, how the definition is to be used or how it
will be researched. What this shows, however, is an understanding amongst the psychology
community that there is often no one single correct definition (Hughes, Ginett, Curphy, 2002),
although of course some may believe their definition to be ‘more appropriate’ or ‘better’ than
others. It also demonstrates how complex and multi-faceted concepts are in that they cannot
be universally defined.

MEASURING POORLY DEFINED CONCEPTS


Presence of poorly defined concepts
What concepts are poorly defined? There are many concepts that lack universally
agreed definitions such as those listed above (eg motivation) but this does not mean they
suffer from a lack of definition per se. There are terms such as morale, however, that have not
been so successfully defined. This term is very significant to the military environment but
seems to escape definitive classification. In a recent investigation of the definition and
measurement of morale, Liefooghe et al (2003) (p29) concluded that “whilst many researchers
opt for the objective stance and attempt to investigate the nature of morale … many skip the
part of explicitly defining the concept. The research process moves from the research
question directly to the measurement, without explicitly addressing definitional issues”. This
finding is at odds with the importance of defining concepts in order to measure them. In the
case of morale there was consensus that it was some form of psychological state, and many
correlates and/or components were identified, but actual conceptual definitions were limited.
Why is this the situation? It is agreed that morale is an important concept, especially in
relation to work performance and group effectiveness, but what is comprised of is more
tricky. Liefooghe et al’s (2003) work shows that previous researchers could not decide if the
term was related to individual or group processes, whether it was a single entity or a process,
or if indeed it was actually a portmanteau construct combining different characteristics.

Reasons to measure of poorly defined concepts


So why should poorly defined concepts be measured? Good practice dictates clear
definition before measurement although this is not always achieved (eg morale) nor is a
universal definition agreed on (eg leadership). The terms which elude definition are often
referred to as nebulous and vague. However, should this be a reason to ignore such concepts
in occupational psychology research and in particular the military context? One argument is
that it is not acceptable to discount these concepts for a number of reasons not least because
they are commonly used in everyday parlance. The existence of such attitudes is ‘observed’
in the military in two forms. Although some individuals may not use the term in question

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
9

they may discuss components of it. For instance, a corporal may not actually mention his
‘level of job dissatisfaction’ but may talk about factors which contribute to the overall attitude
such as ‘not liking my current job’, ‘unhappy with pay and conditions’, ‘I clash with my
superiors’ etc. Alternatively, there is evidence, from at least one recent RAF attitude survey,
that individuals use terms like morale, ethos and stability. The following examples are from
the 2001-2002 Officers' Reasons for Leaving the Service questionnaire:

“The relentless search for cost-cutting is very wearing and morale sapping.”
“Line managers are too busy to discharge proper welfare for subordinates which
undermines morale.”
“Stability and quality of life plays a great part in my decision to leave.”
“A gradual compounding degeneration of branch focus and quality of life, coupled with the
Service’s ethos of paying an individual to leave rather than paying to retain.”

Therefore an interest in the wellbeing of Service personnel makes it is a valid reason


to investigate these issues because of the common and colloquial use of these terms, albeit
poorly defined in the academic sense. It is also apparent that individuals have a shared
understanding of what these terms mean. This is not to say that Service personnel would
universally define morale (or that indeed they could articulate what it meant in psychological
terms) but it is clear that it is an important term to them. Another example is specific to
aircrew. Pilots confirm the importance of ‘airmanship’ to flying/operating aircraft efficiently,
effectively and safely. Many can describe components of it, for example ‘being able to
prioritise’, ‘being about to multi-task’, ‘being aware of everything that is happening inter and
intra cockpit’ but few seem able to define the concept as a whole. This does not diminish the
significance of airmanship nor make it any less a candidate for research. The key here is how
it can be measured; more on this in a moment.

But what other reasons are there to measure terms which are poorly defined? One
important reason is that we are asked to do so by the organisations we serve. To cite morale
again, the following quotation illustrates the perceived significance of the concept:
“High morale, as well as being important in its own right, is a vital factor in retaining
personnel, which in turn improves manning levels and helps to obtain the optimum return
on investment in training. Our aim is to maintain excellent levels of retention and morale
through policies that reflect the priorities of our people and their families” (Ingram, 2002
– UK MOD Minister for Armed Forces ).

More and more there is a requirement to quantify performance against management


targets and indicators. We have to pragmatically research issues which are notoriously
difficult to measure (phrases like ‘poison chalice’ and ‘holy grail’ are conjured up) but which
can suffer from a great deal of anecdotal belief and ‘gut feeling’. This is not to say that
everything can be measured but as occupational psychologists it is our job to use sound
research to investigate these questions. There is a need to provide scientific evidence on
which to base policy decisions and this can mean researching concepts that are not
psychologically well defined but which may have colloquial definitions.

Finally, as occupational psychologists we are constantly trying to make sense of the


working environment, the people within it and the organisation itself. This curiosity should
not prevent us from disregarding concepts that at present are vague (and may always elude
conventional definition) but which can help explain the world of work, especially in the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
10

military context. We are of course not trying to make sweeping generalisations about the
results. Often attitudes raised in survey research require further investigation especially if a
sub set of the population has a different view point to the mainstream. Findings are caveated
because attitudes may not result in direct behaviour or action, or in the direction proposed.
Additionally, the findings may tell us something about a tendency towards a situation but
nothing about that the attitude itself. However, our results will tell us something about
individual or group tendencies and the prevalence of such tendencies.

Measurement of poorly defined concepts


So if there are compelling reasons for trying to measure poorly defined concepts, how
can we achieve this reliably and validly? The solution to this would create enough material
for several additional discussion papers. However, there are a few key points that are worth
higlighting.

Anderson, Herriot and Hodgkinson (2001) believe that the basis for and majority of
occupational/industrial psychology should be in grounded in what they call ‘pragmatic
science’, that is psychology should have “both practical relevance and methodological rigour”
(Anderson et al, p394). It has been suggested that only well defined concepts can be
measured. However, in reality many attitudinal instruments have been developed to measure
poorly defined terms. The relevance and motivation for attitude measures has already been
outlined so the next question relates to ‘methodological rigour’.

Hinkin (1998) sets out the constituents of a good measure of a construct ie ensuring
construct validation. Identify the construct domain (clear definition); develop items (based on
previous research, subject matter experts and pilot studies); determine how items measure the
construct domain (content validity); and assess antecedents, predictors, correlates and
consequences of the concept (convergent, discriminant and criterion validity). Therefore,
measurement can be made reliable and valid although this does not necessarily help define the
concepts in the first place. So what have other researchers done?

Citing morale yet again, as a concept it has been measured as a single entity eg “How
would you rate your own morale?” (Schumm & Bell, 2000) and this might reflect the lack of
operational definition (Liefooghe et al, 2003). Is this the best approach when there is no
agreed definition? Alternatively, should we employ multi-item measures, for example Paulus,
Nagar, Larey and Camacho (1996) who used seven items relating to feelings about being in
the Army, unpleasant experiences in the Army, helpful Army experiences, relationships with
other soldiers, satisfaction with leadership, reenlistment, desire to leave the Army? The
second option assumes that we are confident with the components of a given concept and that
we can construct a ‘morale scale/score’ from the results.

One approach might be to explicitly define terms for respondents ie use stipulative
definitions where we indicate how we will want to use the term. Here a new definition may
be created and one which is specific to a given environment like the military. If a definition
is too prescriptive, examples of attributes and characteristics which comprise the term could
be used; eg for morale terms: ‘dedication’, ‘willingness to sacrifice’, ‘motivation’,
‘confidence’ ‘commitment’. There is a note of caution though, even when a definition is
provided it does not mean that the respondents will use it (Oppenheim, 1992).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
11

It seems that in the absence of universal definitions and/or theoretical frameworks,


practitioners have to try to develop tools that are as reliable and valid as possible, which may
seem unachievable without a clear definition of concepts. However, preliminary observations
and previous research help us construct our items and we are able to determine the
consistency of responses and understanding of terms whilst seeking for more definitive
baselines. There is, therefore, the opportunity to develop working definitions for poorly
defined concepts in order to measure them and some exciting work could emerge in this field
in years to come.

CONCLUSION
Although there is a strong tendency in psychology to define and measure constructs,
there are several key concepts which remain poorly defined and which resist categorisation.
These concepts are still measured in academic research but importantly also by occupational
psychology practitioners including those working in the military environment. This is
attempted because of the common usage of such terms as morale and ethos and by the
increasing need by organisations to quantify personnel’s attitude to a range of issues. In the
interim period (how ever long that might be) before universal or at least working definitions
(probably context based) and/or theoretical frameworks are constructed, measures will be as
reliably and validly developed as possible in order to capture these key attitudes.

REFERENCES
Anderson, N., Herriot, P., and Hodgkinson, G.P. (2001). The practitioner-researcher divide in
industrial, work and organizational (IWO) psychology: Where are we now, and where do we
go from here? Occupational and Organizational Psychology, 74, pp 391-411.

Bell, P.B., Staines, P.J. and Mitchell, J. (2001). Evaluating, doing and writing research in
psychology. Melbourne, Australia: Sage.

Bennis, W.G. (1959). Leadership theory and administrative behavior: The problem of
authority. Administrative Science Quarterly 4, pp 259-260.

Cannell, C.F. and Kahn, R.L. (1968). Experimental psychology. In G. Lindzey and E.
Aronson (Eds). Handbook of social psychology Vol 2. Reading, MA: Addison Wesley.

Hinkin, T.R. (1998). A brief tutorial in the development of measures for use in survey
questionnaires. Organizational Research Methods, 1(1), 104-121.

Hughes, R.L., Ginett, R.C, Curphy, G.J. (2002). Leadership: Enhancing the lessons of
experience. New York, NY: McGraw Hill, 2002.

Ingram, A. (2002). House of Commons Written Answers to Questions, Mon 11 Feb 2002.
http://www.parliament.the-stationary-office.co.uk/pa/cm/cmhansrd

Liefooghe, A., Jonsson, H., Conway, N., Morgan, S., and Dewe, P. (2003). The definition
and measurement of morale: Report for the Royal Air Force. Extra-mural contract report for
RAF: Contract - PTC/CB/00677.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
12

McKenna, E. (1994). Business psychology and organisational behaviour. Hove, UK:


Lawrence Erlbaum Associates Ltd.

Oppenheim, A.N. (1992). Questionnaire design, interviewing and attitude measurement.


Kings Lynn, Norfolk: Pinter Publishers Ltd.

Paulus, P. B., Nagar, D., Larey, T. S. and Camacho, L. M. (1996). Environmental, Lifestyle,
and Psychological Factors in the Health and well-being of Military Families. Journal of
Applied Social Psychology, 26 (23). 2053-2057.

Reber, A.S. (1985). Dictionary of psychology. St Ives, UK: Penguin Books.

Roach, C.F. and Behling, O (1984). Functionalism: Basis for an alternate approach to the
study of leadership. In Leaders and Managers: International Perspectives on managerial
Behavior and Leadership. Ed J.G. Hunt, D.M. Hosking, C.A. Schriesheim and R. Stewart.
Elmsford, NY: Pergamon.

Schumm, W. R. and Bell D. B. (2000). Soldiers at Risk for Individual Readiness or Morale
Problems During Six-Month Peacekeeping Deployment to the Sinai. Psychological Reports,
80. 623-633.

Schwab (1999). Research methods for organizational studies. Mahwah, New Jersey:
Lawrence Erlbaum Associates Ltd.

Young, C.A. (1996). Validity issues in measuring psychological constructs: The case of
emotional intelligence. http://trochim.human.cornell.edu/tutorial/young/ieweb.

© British Crown Copyright 2003, MOD


Published with permission of the Controller of Her Britannic Majesty’s Stationery Office

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
13

Do we assess what we want to assess?


The appraisal dimensions at the Assessment Center for Professional Officers (ACABO)

Dr. Hubert Annen & Lic. phil. Barbara Kamer


Military Academy at the Swiss Federal Institute of Technology
Steinacherstrasse 101b, CH-8804 Au
hubert.annen@milak.ethz.ch

INTRODUCTION
The Assessment Center is a widely used tool for the selection and development of managers
in various organizations. Many studies have demonstrated that assessment center appraisals
predict a variety of important organizational criteria, such as training and job performance or
promotion (Gaugler, Rosenthal, Thornton & Bentson, 1987; McEvoy, Beaty, & Bernardin,
1987).
It appears therefore that the assessment center has a good predictive validity. However,
scientists seem to differ about what each individual dimension and what the tool itself really
measure or what the observers exactly assess. Because when it comes to construct validity,
most studies show a similar picture: ratings of multiple dimensions within a single exercise
correlate higher than do ratings of the same dimension across multiple exercises (Annen,
1995; Bycio, Alvares, & Hahn, 1987; Kleinmann, Kuptsch, & Köller, 1996; Robertson,
Gratton, & Sharpley, 1987; Sackett & Dreher, 1982; Turnage & Muchinsky, 1984).
There are many hypotheses which try to explain why the assessment center is nevertheless a
good predictor for future job success. Russel und Domm (1995) for example say that
Assessment Centers have such a high prognostic value because they measure attitudes which
are of importance for the future job. For a better understanding of the predictive value of the
assessment center, Shore, Thornton und Shore (1990) claim that the construct validity of
dimension ratings should be explored by building a nomological network of related
constructs. Their own studies showed that, during an assessment center, cognitive ability
correlate stronger with problem-solving dimensions and that personal traits have a stronger
connection with interpersonal dimensions. Other studies focus on the connection between
personality factors and cognitive ability respectively and the performance in the assessment
center, and they have produced significant results (Crawley, Pinder, & Herriot, 1990; Fleenor,
1996; Chan, 1996; Goffin, Rothstein & Johnston, 1996; Scholz & Schuler, 1993, Spector,
Schneider, Vance, Hezlett, 2000).
Taking into account the various studies made on the subject, it seems to be difficult to
establish a correlation between the results of assessment center and certain other criteria. Each
assessment center in an organization is tailored to persons with a specific job background.
Depending on the job profile which the successful candidate should meet other criteria of
observation are also used and operationalized according to the requirements of the job. It is
therefore of vital importance to have a clear idea of what is really measured through the given
dimensions and whether or not we really measure what we mean to measure.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
14

THE ASSESSMENT CENTER FOR PROFESSIONAL OFFICERS (ACABO)


During the winter semester 1991/92 the Swiss Military College at the Federal Institute of
Technology Zurich (ETH) introduced a diploma study course for future professional officers.
In much the same way as managers in the private sector, future professional officers not only
need to have intellectual ability and technical skills but they must also show a high level of
social competence. Therefore in 1992 a three-day assessment center programme was
developed in order to provide students with an appraisal of their situation and hints for
improvement concerning their personal and social competences and to provide the trainers
with more accurate information regarding their students. In this form the assessment center
was neither a pure selection instrument, nor a long-term potential appraisal. In 1996 the
assessment center finally became a definite selection tool, which was an obstacle to be
overcome by every candidate before the beginning of his study course.
The ACABO is a classical three-day assessment center. The candidates have to deal with
reality-based tasks in group discussions, presentations and role play. The observer team is
composed of superiors and chiefs of training who are recruited above all from divisions which
have sent candidates. Because the Swiss Militia Army can still be considered a part of society,
civilian assessors – usually psychologists or human resources specialists – are also employed.
During the assessment center, each participant is appraised by several observers according to
seven rating criteria.
Owing to the fact that the decisions taken during the assessment center have far-reaching
consequences, a scientific evaluation on a regular basis and resulting adaptations as well as
further developments of the procedure are indispensable. Besides studies on the social validity
(Hophan, 2001) or interrater reliability (Wey, 2002) other studies particularly with regards to
construct and criterion related validity (Annen, 1995) as well as prognostic validity
(Gutknecht, 2001) were conducted. In more recent studies (Annen & Kamer, 2003)
endeavours were made to make a further contribution to the nomological network of the
assessment center and to show the connection between personality factors, cognitive
competence and the assessment center results.
It has always been considered a basic principle to use the findings of the studies on ACABO
not only for scientific purposes but to implement them in practice in order to further develop
the tool. Since the current paper can be seen as another contribution to the understanding of
our assessment center dimensions and since the research design is based on the findings of
former studies, the following pages will again present the most important former studies and
their practical implications.

FORMER STUDIES
Social validity
Based on the concept of social validity (Schuler, 1998) the candidates fill in a questionnaire at
completion of ACABO, in which they are asked to convey their impressions and attitudes to
the ACABO they have just gone through. Hophan (2001) critically examined the results and
has come to the conclusion that the high level of acceptance by the participants is independent
of their geographical origin, school-qualifications, their assessment center results, their age or
their military rank.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
15

Construct and criterion related validity


Annen (1995) examined the construct and criterion related validity of the ACABO and
concluded that the ACABO, like many other assessment center, shows no construct validity in
a test theoretical sense. Based on these findings the question then arises as to how the
dimensions could be depicted in a more differentiated and valid way. Of special interest was
the dimension “analysis”, because there was the uncertainty of whether or not some
candidates were unable to present an adequate solution to the problem because of their lack of
structural and analytical abilities or due to their practical inexperience.
Further development of ACABO based on this study: In order to better back up the dimension
“analysis” it was decided in 1996 to introduce written cognitive ability tests and to integrate
the result to a fourth of its value into the rating of the dimension “analysis”.

Prognostic validity of ACABO


Gutknecht (2001) examined the various tools used for the selection of professional officers in
their prognostic validity regarding study and job success. His findings showed that school
performance (grades of upper secondary certificate) and cognitive ability (tests in the
ACABO) have the highest prognostic validity regarding study success. The cognitive ability
together with study success turn out to be the best predictors of job success (assessed by
means of job appraisals by superiors). Based on the knowledge that the assessment center
should have a high prognostic validity, a high correlation between the competences measured
during assessment center and job success is to be expected. But this is simply not the case.
The results of the assessment center reveal that the predictor “social competence” can be
called construct valid, but this predictor shows no significant correlation with study success or
with job appraisals by superiors.
It has also been confirmed on various occasions that assessment center rather predict the
development of the career than future performance ratings (e.g. Thornton, Gaugler, Rosenthal
& Bentson, 1992). Scholz and Schuler (1993) strongly believe that in the assessment center
“the qualities which are relevant are rather those which foster promotion than those which
predict performance in the first place” (p.82). Therefore, in a second study, not only the
performance appraisal was taken into consideration, but also the fact of being a member of the
general staff, which could be considered as an indicator for the successful career of a
professional officer. Based on this study it can be concluded that the social competences
which are rated at ACABO do correlate with this operationalization of job success or
successful career respectively.
Further development of ACABO based on this study: At the end of the assessment center an
appraisal matrix for each participant is established and in the concluding observer conference
each matrix is reconsidered and ratings regarding the overall dimension, which do not yield a
clear arithmetic result are rounded (up or down) after a consensus-based decision. The
cognitive ability tests, which were taken into account earlier for the “analysis” by only one
fourth of its weight, can now play an important role. Since 2002 these cognitive ability tests
have been taken into consideration whenever a candidate reached a close result or when the
outcome was arithmetically unclear and thus tipped the scales in the observer conference in
favour or disfavour of the candidate.
Concerning the operationalization of job success, further endeavours have to be made in order
to define this outer criterion in the most accurate way possible.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
16

ACTUAL STUDY
The relation of cognitive ability and personality traits to assessment center performance
in the ACABO

The studies by Annen (1995) and Gutknecht (2001) show that ACABO has prognostic
validity regarding job success (composed of job appraisal by superiors and membership of
general staff) of professional officers, but that the ACABO dimensions have no satisfying
construct validity. The question now arises as to what the dimensions really measure.
Therefore we have to find out if the assessments are based on some hidden aspects such as
certain personality traits.

Method
Participants: Assessment ratings were obtained on 214 assessees in a 3-day selection
assessment process conducted during a 3-year period. All the participants had a secondary
degree (Matura) and were officers of the Swiss Army.
Assessors and ratings: As already mentioned, the observer team is composed of superiors and
chiefs of training who are recruited above all from divisions which have sent assessment
center candidates and civilian specialists in psychology and human resources management.
All assessors received extensive, specific instruction and on-the-job training in conducting
assessments. Each assessor rated candidates on a 4-point rating scale. In order to guarantee a
fair and well-based judgement, the assessment follows a procedure involving several stages.
During the perception and assessment stage, observation and judgement must be strictly kept
separated. Next, results from individual, main or secondary, observers are thoroughly
discussed after each exercise. In the final observer conference the appraisal matrix of every
candidate is discussed again.
Assessment center exercises and rating dimensions: Currently the requirement profile an
ACABO candidate has to fulfil consists of seven dimensions (personal attitude, motivational
behaviour, analysis, social contact, oral communication, dealing with conflicts, influencing
behaviour). In focusing on activities a candidate might meet immediately after his completion
of the diploma study course the following six exercises were designed: Two presentation
exercises (spontaneous short oral presentation and a prepared 20-minute oral presentation),
two group discussions (leaderless group discussion and debate) and two exercises in role play
(motivational talk and short cases).
Personality and cognitive ability measures: In order to assess personality a short version of
the MRS-inventory by Ostendorf (1990) which was developed by Schallberger & Venetz
(1999) is used. This 20-item short version assesses the dimensions of the five-factor-model of
personality (extraversion, agreeableness, conscientiousness, emotional stability and openness
to experience). Despite its shortened version the tool has still a high factoral validity and a
sufficiently high reliability for research purposes (Schallberger & Venetz, 1999).
Cognitive ability was measured by a test specifically designed by Saville & Holdsworth Ltd
(SHL) for the selection of managers and tailored for the ACABO. The test battery consists of
three tests regarding „verbal comprehension (VC1), „numeric comprehension“ (NC2) and
”diagram comprehension“ (DT8). It is obvious that these three tests are focused on the
construct “general intelligence” and studies by Gutknecht (2001) have shown that the three
tests can be depicted in the common construct “cognitive ability”.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
17

Results
Due to the lack of a normal distribution of our data we have abstained from an evaluation on a
higher level and have limited ourselves to showing only the correlative connections.
Table 1 shows the connection between the Big Five and the individual dimension ratings as
well as the overall assessment center score in the ACABO. The ACABO dimensions were
broken down into three categories: personality dimensions (personal attitude and motivational
behaviour), social dimensions (social contact, oral communication and influencing behaviour)
and cognitive dimensions (analysis and dealing with conflicts). Here it was shown that
“emotional stability” is an especially basic personality trait offering a good basis for passing
the assessment center. It shows significant correlations with the overall assessment center
score as well as with all personality dimensions and all social dimensions. Yet there is no
significant correlation with cognitive dimensions.
The personality trait „extraversion“ correlates only with the rating of the „personal attitude“,
but not with other dimensions ratings or the overall score.
It seems that cognitive ability still have an slightly higher influence on behaviour or ratings at
the ACABO than personality. Whereas a correlation with personality tests is low, cognitive
ability tests, however, show a highly significant correlation with the ratings of the cognitive
dimensions “analysis” and “dealing with conflicts”. This comes as no surprise as the
dimension “analysis” measures the analytical ability in dealing with problems, which is
indispensable in dealing with conflicts in a reasonable way.
The cognitive ability tests show a clear and significant correlation with the ratings of „oral
communication“, which could be due to the fact that one of the three tests measuring
cognitive ability refers to verbal comprehension.
Furthermore there appears to be a connection between the cognitive ability tests and
„motivational behaviour“ at the ACABO.

Table 1

Correlations of cognitive ability and personality with dimension and overall ac scores (n = 214)

Agre Cons Stbl Extr Open CA


Social dimensions
social contact .13 .01 .17 * .11 -.01 .10
oral communication -.02 .06 .15 * .13 .01 .18 **
influencing behaviour .07 .02 .15 * .10 .03 .11
Personality dimensions
personal attitude .10 -.02 .14 * .18 * .01 .11
motivational behaviour .02 .02 .14 * .02 .05 .16 **
Cognitive dimensions
analysis -.01 .01 .06 .08 .02 .18 **
dealing with conflicts .02 .02 .07 -.01 .00 .15 **
Overall ac rating .05 .03 .16 * .11 .01 .18 **
Note. Agre = agreeableness, Cons = conscientiousness, Stbl = emotional stability, Extr = extraversion, Open =
openness to experience, CA = cognitive ability. *p < .05, **p < .01.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
18

All in all it can be stated that both the personality trait „emotional stability“ and the cognitive
ability are relevant to the ratings in the assessment center. It shows that “emotional stability”
is particularly well reflected in social and personality dimensions and that the cognitive ability
correlate highly with the cognitive dimensions as well as with “oral communication” and
“motivational behaviour”.

Acting on the assumption that the dimensions show no construct validity we have also
examined the correlations of the overall performance on assessment center exercises with
measures of cognitive ability and personality (Table 2). Given that “emotional stability”
correlates with the ratings of the social dimensions, we can assume that this personality trait
also shows a correlation with the overall ratings of exercises with a strong interpersonal
orientation such as group exercises. This hypothesis was confirmed. “Extraversion” did not
significantly correlate with the ratings of the social dimensions, yet there is a statistically
conclusive correlation with the overall ratings in the group exercises, which could be
interpreted as a hint to a halo-effect. Finally a significant correlation between cognitive ability
and the ratings in the presentation exercises and role plays can be established, which is not
surprising given the fact that these exercises require a systematic problem analysis, good
comprehension and problem-solving skills.

Table 2

Correlations of cognitive ability and personality with exercise performance (n = 214)

Agre Cons Stbl Extr Open CA


Group discussions .06 .06 .16 * .23 * .10 .07
Presentations .05 -.01 .13 .01 -.06 .21 **
Role Plays -.01 .03 .06 .08 .07 .14 *
Note. Agre = agreeableness, Cons = conscientiousness, Stbl = emotional stability, Extr = extraversion, Open =
openness to experience, CA = cognitive ability. *p < .05, **p < .01.

Schuler (1991) considers “intelligence” and „emotional stability“ to be essential determinants


for professional performance. Based on their meta-analysis, Scholz und Schuler (1993) come
to the conclusion that „emotional stability“ is relevant to success in general (Baehr, 1987;
Burke & Pearlman, 1988; Hoelemann, 1989), but does not show in assessment center. This
statement can not be backed by our findings.
Further development of ACABO based on this study: Like the studies made by Gutknecht
(2001) these current findings underscore the high prognostic value of the cognitive ability
tests concerning the assessment center results. The question now arises if cognitive ability
tests should not have a higher importance in the future by adding them as an individual
dimension score to the overall assessment center result.
From the findings regarding „emotional stability“ it can be concluded that this personality
trait is useful for showing a good performance during ACABO or being rated favourably by
the assessors. This result can be interpreted in such a way that a future professional officer has
to be “a role model for the other members of the army in every situation …. and to lead
successfully under difficult conditions “ (Schweizer Armee, 2001, S. 6). This requires a
certain amount of „emotional stability“. It would therefore make sense to measure this factor

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
19

during ACABO. The question was therefore raised if this personality trait with respect to the
job profile of a future professional officer should become more relevant. The conclusion that
was arrived at was if whether or not, it should become an explicit part of the ACABO-
dimensions.

OUTLOOK
We have presented a number of former studies and illustrated the ensuing consequences for
the further development of ACABO. Evaluation is a process in progress and next we will
focus our interest especially on self and peer appraisal within the assessment center.
Preliminary studies have shown that peer, self and assessor appraisals regarding the influence
of behaviour in a group exercise are very similar; yet it has also become clear that the
participants have great difficulty in assessing their overall assessment center score themselves.
However, the sample is still too small to make conclusive statements or to give
recommendations. Depending on the results of a further study some form of peer appraisal
could be taken into consideration; e.g. as an additional source of information for the
evaluation or as additional feedback for the participants. It would also be interesting to pay
more attention to the self evaluation of the candidates given the fact that studies have shown
that there seems to be various links between the correlation of self perception and perception
by others and various organizational criteria such as job performance or job promotion
(McCall & Lombardi, 1983, Van Velsor, Taylor & Leslie, 1993; Bass & Yammarino, 1991;
McCauley & Lombardo, 1990; Yammarino & Atwater, 1993).

BIBLIOGRAPHY
Annen, H. (1995). Konstrukt- und kriterienbezogene Validität des MFS-Assessment Centers.
Unveröff. Lizenziatsarbeit, Universität Zürich, Psychologisches Institut, Abt.
Angewandte Psychologie.
Annen, H. & Gutknecht, S. (2002). Selektions- und Beurteilungsinstrumente in der
Berufsoffizierslaufbahn - eine erste Wertung. Allgemeine Schweizerische
Militärzeitschrift, 2/02, 19-20.
Annen, H. & Gutknecht, S. (2002). The Validity of the Assessment Center for Future
Professional Officers. Proceedings of the 44th Annual Conference of the International
Military Testing Associations [CD-Rom].
Baehr, M. E. (1987). A review of employee evaluation procedures and a description of "high
potential" executives and professionals. Journal of Business and Psychology, 1, 172-
202.
Bass, B. M., & Yammarino, F. (1991). Congruence of self and others' leadership ratings of
naval officers for understanding successful performance. Applied Psychology: An
International Review, 40, 437-454.
Burke, M. J. & Pearlman, K. (1988). Recruiting, selecting, and matching people with jobs. In
J.P. Campbell & R.J. Campbell (Eds.), Productivity in organizations. San Francisco:
Jossey-Bass.
Bycio, P., Alvares, K. M., & Hahn, J. (1987). Situational specifity in assessment center
ratings: A confirmatory factor analysis. Journal of Applied Psychology, 72(463-474).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
20

Chan, D. (1996). Criterion and construct validation of an assessment center. Journal of


Occupational and Organisational Psychology, 69, 167-181.
Crawley, B., Pinder, R., & Herriot, P. (1990). Assessment center dimensions, personality, and
aptitudes. Journal of Occupational Psychology, 63(211-216).
Fleenor, J. W. (1996). Constructs and developmental assessment center: Further troubling
empirical findings. Journal of Business and Psychology, 10, 319-335.
Gaugler, B. B., Rosenthal, D.B., Thornton, G.C., III, & Bentson, C. (1987). Meta-analyses of
assessment center validity. Journal of Applied Psychology, 74, 493-511.
Goffin, R. D., Rothstein, M.G., & Johnston, N.G. (1996). Personality testing and the
assessment center: Incremental validity for managerial selection. Journal of Applied
Psychology, 81, 746-756.
Gutknecht, S. (2001). Eine Evaluationsstudie über die verschiedenen Instrumente der
Berufsoffiziersselektion und deren Beitrag zur Vorhersage des Studien- und
Berufserfolges. Unveröff. Lizenziatsarbeit, Universität Bern, Psychologisches Institut,
Btl. für Arbeits- und Organisationspsychologie.
Hoelemann, W. (1989). Langzeitprognose von Aufstiegspotential. Zeitschrift für
betriebswirtschaftliche Forschung, 41, 516-525.
Hophan, U. (2001). Participants' reactions to Assessment Centres. Manchester: Manchester
School of Management.
Kleinmann, M., Kuptsch, C., & Köller, O. (1996). Transparency: A necessary requirement for
the construct validity of assessment centers. Applied Psychology: An International
Review, 45, 67-84.
McCall, M. L., M. (1983). Off the Track: Why and How Successful Executives get Derailed.
Greensboro, NC: Center for Creative Leadership.
McCauley, C. L., M. (1990). Benchmarks: An instrument for diagnosing managerial strength
and weaknesses. In K. E. C. M. B. C. (Eds.) (Ed.), Measures of leadership (pp. 535-
545). West Orange, NJ: Leadership Library of America.
McEvoy, G., Beaty, R., & Bernardin, J. (1987). Unanswered questions in assessment center
research. Journal of Business and Psychology, 2, 97-111.
Ostendorf, F. (1990). Sprache und Persönlichkeitsstruktur. Zur Validität des Fünf-Faktoren-
Modells der Persönlichkeit. Regensburg: Roderer.
Robertson, I. T., Gratton, L., & Sharpley, D. (1987). The psychometric properties and design
of managerial assessment centers: Dimensions into exercises won't go. Journal of
Occupational Psychology, 60(187-195).
Russell, C. I., & Domm, D.R. (1995). Two field tests of an explanation of assessment center
validity. Journal of Occupational and Organisational Psychology, 68, 25-47.
Sackett, P. D., & Dreher, G.F. (1982). Constructs and assessment center dimensions: Some
troubling empirical findings. Journal of Applied Psychology, 97(401-410).
Schallberger, U. & Venetz, M. (1999). Kurzversion des MRS-Inventars von Ostendorf (1990)
zur Erfassung der fünf "grossen" Persönlichkeitsfaktoren. Unveröff. Bericht,
Universität Zürich, Psychologisches Institut, Abt. Angewandte Psychologie.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
21

Scholz, G. S., H. (1993). Das nomologische Netzwerk des Assessment Centers: eine
Metaanalyse. Zeitschrift für Arbeits- und Organisationspsychologie, 37, 73-85.
Schuler, H. (1991). Der Funktionskreis "Leistungsförderung"- eine Skizze. In H. S. (Hrsg.)
(Ed.), Beurteilung und Förderung beruflicher Leistung (pp. 171-189). Göttingen:
Hogrefe/Verlag für Angewandte Psychologie.
Schuler, H. (1998). Psychologische Personalauswahl. Göttingen: Verlag für Angewandte
Psychologie.
Schweizerische Armee. (2001). Das Militärische Personal der Armee XXI: Leitbild. Bern:
Chef Heer & Kdt Luftwaffe.
Shore, T. H., Thornton, G.C., III, & Shore, L.M. (1990). Construct validity of two categories
of assessment center dimension ratings. Personnel Psychology, 43, 101-116.
Spector, P. E., Schneider, J.R., Vance, C.A., & Hezlett, S.A. (2000). The relation of cognitive
ability and personality traits to assessment center performance. Journal of Applied
Social Psychology, 30(7), 1474-1491.
Thornton, G. C. I., Gaugler, B.B., Rosenthal, D.B., & Bentson, C. (1992). Die prädikative
Validität des Assessment Centers - eine Metaanalyse. In H. S. W. S. (Hrsg) (Ed.),
Assessment Center als Methode der Personalentwicklung (pp. 36-60). Göttingen:
Hogrefe/Verlag für Angewandte Psychologie.
Turnage, J., & Muchinsky, P. (1984). A comparison of predictive validity of assessment
center evaluations versus traditional measures in forecasting supervisory job
performance: Interpretive implications of criterion distortion for the assessment
paradigm. Journal of Applied Psychology, 69, 595-602.
Van Velsor, E., Taylor, S. & Lesli, J. (1993). An examination of the relationship among self-
perception accuracy, self-awareness, gender, and effectiveness. Human Resource
Management, 32, 249-264.
Wey, M. (2002). ACABO. Assessment Center für angehende Berufsoffiziere. Eine Analyse der
Interrater-Reliabilität in Bezug auf die Gesamtbeurteilung sowie die dimensions-
sowie übungsspezifischen Urteile. Praktikumbericht, Militärakademie, Dozentur
Militärpsychologie und Militärpädagogik.
Yammarino, F., & Atwater, L. (1993). Understanding self-perception accuracy: Implications
for human resource management. Human Resource Management, 32, 231-247.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
22

Personality as predictor of job attitudes and intention to quit

Simon P. Gutknecht
Military Academy at the Swiss Federal Institute of Technology Zurich
STEINACHERSTRASSE 101B, 8804 AU
Switzerland
simon.gutknecht@milak.ethz.ch

Introduction
The Swiss Army is going through a time of change. There is, in addition, a radical change
in process at the moment which goes along with the reduction in size of the army also brings
with it the establishment of new functions. The uncertainties with respect to future positions and
functions are felt especially strong among professional and non-commissioned officers. The
question of how the affective commitment to the establishment and job satisfaction (job
attitudes) is influenced has to be raised. This is to be taken seriously since these variables have
been important predictors concerning intention to quit or absenteeism (Lum, Kervin, Clark, Reid
& Sirola, 1998; Michaels & Spector, 1982; You, 1996).
Besides the general questions about factors which influence attitudes toward work such
as for example salary, job security as well as leadership quality, it is especially in times of
change, interesting to see if there are people who due to their disposition suffer less or more
under the process of change. A question related to this aspect is to what extent do certain
personality dispositions (e.g. extraversion) have an effect on work related attitudes as well as on
intentions to take action.
The consideration of personality traits is not only interesting from the point of view of
basic research but carries a certain relevance to practice. In addition to the traditional AC
exercises done within the framework of the “assessment center for professional officers” a
personality test (to assess the Big 5: extraversion, agreeableness, conscientiousness, neuroticism
& culture) has also been introduced. The results of this test have so far only been used for
research purposes and therefore no weight within the selection process has been attributed to it.
However it is important for the people responsible at the Assessment Center to know what
further pieces of additional information are contained in this test. Although the relationship
between personality variables and job performance have mainly been of interest (cf. Barrick &
Mount, 1991; Day & Silverman, 1989; Schmidt & Hunter, 1998) organisational attitudes such as
affective commitment or job satisfaction can also be used as external criteria to assign the
criterion validity of this test. In the end one wants to select people who are also in times of
difficulty committed to the organization. While collecting data on the topic job satisfaction and
affective commitment within the framework of a study for validation purposes the scale for
assessing the Big 5 was also used.

Findings concerning personality and job attitudes

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
23

There are not many studies with the sole aim of looking more closely at the connection
between affective commitment and personality. There are, however, elaborate studies with
reference to job satisfaction which deal with the influence of traits such as general self-esteem,
general self-efficacy, locus of control as well as neuroticism (emotional stability) (cf. (Judge,
Locke, Durham & Kluger, 1998; Judge, Bono & Locke, 2000). These studies point out that these
traits have an influence on the perception respectively the judgement of the work situation and in
this way have an indirect influence on the construct job satisfaction. Direct effects even though
present are weaker.
As to the results of the influence of the Big 5 there are isolated references. Judge, Heller
and Mount (2002) observed with the help of a meta-analysis that only the factors “neuroticism”
and “extraversion” were significantly connected to general job satisfaction. On the other hand
Tanoff (1999) could demonstrate that all the factors of the Big 5 with the exception of culture
were related to job satisfaction. It must be added that the variable neuroticism played a decisive
role. Seibert and Kraimer (2001) ascribe the Big 5 factors in connection with the variable job
satisfaction a prognostic validity as well but the effects are rather minor. In this study the
variable extraversion was of importance. Day, Bedeian and Conte (1998) also found an influence
of the variable extraversion on job satisfaction, but the coefficient .10 is rather modest.
As mentioned above there are only very few studies that deal directly with the relation
between the Big 5 and commitment. In a recent study done by Naquin and Holton (2002) the
variables neuroticism, conscientiousness as well as agreeableness show a relation to the variable
affective commitment. Otherwise there are no findings in this respect.
These results are interesting and show that in the debate over increasing job satisfaction
or commitment the hypothesis that personality disposition is relevant seems justified. On the
basis of the above mentioned results concerning traits and job satisfaction where the variable
neuroticism is mainly of an indirect nature, the question has to be asked to what extent is this
true for the other Big 5 traits. Especially job characteristics (skill variety, task identity, task
significance, autonomy and feedback) as introduced by Hackman and Oldham (1980) in the job
characteristic model (JCM) must have a significant mediating role between personality features
and job related attitudes. This can be assumed in the context of the studies done by Judge et al.
(1998, 2000).
The question to be asked is in what way beside the content aspect, the so called context
factors (satisfaction with salary, colleagues, job security & leadership quality), as listed in the
JCM, serve as mediators. It seems that these factors are in correlation with job satisfaction and
especially so with affective commitment. Context factors have a not to be underestimated role
especially in times of change and its related uncertainty.

In the following it will be shown what kinds of personality traits influence the job
attitudes of the Swiss professional military. To test the direct as well as the indirect influences
the personality factors, affective commitment, job satisfaction and personality traits, as well as
job characteristics and context factors were considered in the calculations. These variables were
put into a relationship with the variable intention to quit.

Method
Setting and Participants

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
24

820 questionnaires done anonymously (coded) were sent to the private address of the
professional military personnel. 420 questionnaires were sent to professional officers and 400 to
professional non-commissioned officers. Both sample groups of addressees were of random
choice. The return rate was 61% (n=499). 19 questionnaires had to be disregarded because they
were insufficiently filled in, which means that there were a total of n=480 . The average age of
the total sample was 42.5 years.

Measures
The job characteristics (skill variety, task identity, task significance, autonomy and
feedback), satisfaction with salary and colleagues, perceived job security and leadership quality
(context factors) as well as job satisfaction were registered with the German version (van Dick et
al., 2001) of the “Job Diagnostic Survey” (JDS) by Hackman and Oldham (1975). The individual
contents of job characteristics were integrated into one scale value as suggested by Hackman and
Oldham. The respective items were put into a 6-point Scale.
A German translation of the Organizational Commitment Questionnaire (OCQ) by Maier and
Woschée (2002), was used to record the affective commitment. The range of the scale was from
1 - 7. 4 items (scale 1-5) by Baillod (1992) were used to record the intention to quit.
The Big 5 were recorded with the version MRS-30 by Schallberger and Venetz (1999). The
respective scale consisted of six bi-polar pairs of adjectives as “vulnerable” – “sturdy” or
“secure” – “insecure”. The test person had to indicate out in a 6-poit Scale how these adjectives
applied to him.
The reliability coefficients of the respective scales are shown in table 1. The context
factors could not be established because the items were too small to calculate a coefficient.

Results
The internal consistency of the scales were sufficient to very good. With reference to the
means it can be said that the variables job satisfaction (4.60) and job characteristics (4.44) were
judged as satisfactory to good, but not the context factors with the exception of the variable
satisfaction with colleagues. The variable affective commitment is lower than the variable job
satisfaction because it was measured on a scale from 1-7. The 4.55 represents a rather low level
of affective commitment. There was nothing conspicuous as to the means of the personality
constructs.
On the basis of the matrix it can be seen that there is a moderate to high connection but to
a varying degree between job characteristics, context factors and the variables commitment, job
satisfaction and intention to quit. The influence of the context factor satisfaction with colleagues
is far less than that of other variables.

M SD α 1 2 3 4 5 6 7 8 9 10 11 12
1 Job characteristics 4.60 .56 .84 -
Context factors
2 Job security 3.72 1.16 - .34 -
3 Salary 3.60 1.20 - .24 .48 -
4 Leadership 3.60 1.20 - .46 .48 .36 -
5 Colleagues 4.60 .92 - .29 .04 -.03 .20 -
Personality

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
25

6 Neuroticism 4.53 .56 .71 -.25 -.14 -.01 -.11 -.16 -


7 Extraversion 4.11 .64 .77 .16 -.02 -.09 .04 .14 -.20 -
8 Culture 4.32 .56 .66 .11 -.05 -.07 .04 .07 -.38 .33 -
9 Agreeablesness 4.49 .55 .70 .05 .03 .09 .06 .10 -.22 -.02 .20 -
10 Councentiousness 4.95 .59 .84 .18 .06 -.05 .02 .05 -.30 .14 .31 .35 -
Attitudes
11 Job satisfaction 4.44 .85 .85 .55 .56 .47 .54 .23 -.21 .03 .03 .05 .09 -
12 Commitment 4.55 .93 .89 .44 .58 .48 .51 .11 -.15 .11 .07 .00 .14 .75 -
Intention
13 Intention to quit 3.55 .97 .80 -.34 -.50 -.46 -.44 .06 -.09 .04 .02 -.03 -.02 -.75 -.68

Table 1: Bivariate correlations, means, standard deviations and Cronbachs Alpha. Correlations ≥ .09 are significant at the .05 level.

As far as the personality variables are concerned (see table 2) they correlate only sporadically
and only if so moderately to weakly with the job characteristics, context factors (salary,
leadership, job security and colleagues) and job attitudes. Only the variable neuroticism appears
in 7 of 8 correlations and shows significant values. As to extraversion there are just 4. As to
conscientiousness there are only 3 significant correlations, to agreeableness two and to culture
just one. Neuroticism, extraversion and conscientiousness show significant correlations with job
attitudes but just neuroticism is in a significant relation to intention to quit.
What is striking is the high correlation of .75 between affective commitment and job
satisfaction (see table 1). Starting with .80 the assumption of a multi-co linearity is high.
Respective tests did not result in a clear picture therefore the variable job satisfaction was no
longer taken into consideration for further analysis. The variable affective commitment was kept
because it manifested a new construct and therefore has hardly been tested. Furthermore this
variable will be of special interest in the context of the changes in the Swiss Army.

Neuroticism Extraversion Culture Agreeableness Conscientiousness


Job -.25 .16 .11 .05 .18
Characteristics
Job Security -.14 -.02 -.05 .03 .06
Salary -.01 -.09 -.07 .09 -.05
Leadership -.11 .04 .04 .06 .02
Colleagues -.16 .14 .07 .10 .05
Job Satisfaction -.21 .03 .03 .05 .09
Affective -.15 .11 .07 .00 .14
Commitment
Intention to quit -.09 .04 .04 -.03 -.02
Table 2: Bivariate correlations. Correlations ≥ .09 are significant at the .05 level.

Structural Models
In the following the direct as well as the indirect effects resulting from the personality
variable will be determined. It is wise to only consider as many variable as absolutely necessary
in such a complicated procedure as encountered in structure models. This has as a consequence
that only those constructs can be considered where because of a bivariate correlation it can be
assumed that there will be a connection to the variable affective commitment. Such an approach
is justified because of the explorative character of the study. Therefore only neuroticism,
extraversion and conscientiousness are considered among the personality variables. This makes

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
26

sense because according to results reached in other studies an effect can be especially expected
from these constructs (Judge et al., 2002, Tanoff, 1999). On top of this the context factors are
summed up in one factor (α = .81).
To check the model-fits the conventional indices such as goodness-of-fit-index (GFI),
Tucker Lewis Index (TLI) as well as root-mean squared-error-of-approximation (RMSEA) were
used. The latter two are of importance because they can be interpreted independent of the sample
size.
In figure1 the model is estimated on the basis of the total sample. Hardly any differences
were found between the fit indices of the fully mediated model (GFI=.93, TLI=.89,
RMSEA=.=85) and the one where direct and indirect influences are allowed (GFI=.93, TLI=.89,
RMSEA=.086). Additional variance (.02) even if the minimal is accounted for in the model that
allows direct and indirect influences depending on personality traits. If the effects of the variable
neuroticism for the most part come about indirectly then the variable conscientiousness has a
direct influence on the affective commitment even though these effects are minimal. As to the
total effects, the contribution of the variable neuroticism was in the mediated model (.17) as well
as in the model allowing direct effects (.17) bigger than the effects of job characteristics
(.15/.11). Thus the variable neuroticism with reference to affective commitment shows similar
effects as was determined in other studies in reference to job satisfaction. This cannot be said for
the variable extraversion, because there was neither a direct nor indirect significant influence.
Concerning job satisfaction the mediators which seemed to hold true were mainly the job
characteristics while in the context of the affective commitment it seems that context factors
carry a very high influence. To what extent this is related to the ongoing changes in the army is
difficult to determine and can only be established with the help of successive studies. It seems
that many are content with the quality of the job characteristics but not with the context factors
with the exception of the colleagues as already previously described. This also means that even
though there is possibly displeasure the distinction between content aspects and factors such as
job security or salary are made which is clearly illustrated in figure 1 and which mainly seems to
have an effect on the affective commitment.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
27

Neuroticism

-.20** .00

-.38** Context
-.26** Factors .67**
.50
.01
Affective
-.37** .06 Organizational
Extraversion
.09 Commitment
.11**
-.03 Job
Characteristics
.21**

.08 .09*

Conscientiousness

Figure 1: Structural Model; * = p<.05; ** = p<.01

Since the sample contains all the age groups it can be assumed that the length of time of
being a employee must have an influence. Therefore this aspect has to be considered. The sample
is divided into two age groups (“younger” & “older”) and the corresponding models are
calculated. The greatest influence in judgement is due to the context factors. Only in the sub
sample “older” do job characteristics have an important influence on the affective commitment.
Regarding the influence in the sub-group “younger” there can be found some indirect effects
starting with the variable neuroticism. The total effect amounted to .34 in the sub-sample
“older”. In both the samples the variable extraversion was affected. In the case of the sub-sample
“younger” this variable shows a direct effect of .23. The total effects are illustrated in table 3.

whole sample “younger” “older”

Context factors .67 (60%) .61 (50%) .65 (47%)


Job characteristics .11 (10%) .09 (07%) .22 (16%)
Neuroticism .17 (15%) .17 (14%) .34 (25%)
Extraversion .08 (07%) .23 (19%) .15 (11%)

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
28

Concetiousness .09 (08%) .11 (10%) .02 (01%)


Table 3: Total standardized effects on “affective commitment.” Percentage of accounted variance are put in brackets.

Discussion
Generally it can be said that the personality factors extraversion and especially
neuroticism have an indirect as well as a direct influence on affective commitment. It is however
interesting to note that these effects on affective commitment are different with regards to the
sub-samples. Whereas the influence of the variable neuroticism in the sub-samples “younger”
and “older” was as expected moderately high, the direct effects based on the personality traits are
missing. As to the findings in general: There is a big influence from the context factors in terms
of commitment. These effects could be established in the sub-samples. The job characteristic
received a bigger weight in the sub- sample ”older” than in the other.
The results have to be taken cautiously. On the one hand it is only a cross-sectional
analysis. This means the consistency of the established effects cannot be tested and there can be
no talk of causality. On the other hand the sub-samples are too small. Therefore artefacts cannot
be ruled out.
The results serve the purpose of first points of reference. Further studies are planned and
will be analysed more minutely, for example with reference to distinctive features such as rank
or function in the analyse. It can be assumed that the use of a personality test in the selection
process for professional officers modestly serves its purpose to clarify commitment as well as
intention to quit. How one can continue to justify the use of these tests can only be established
with further studies and if economic considerations allow it.

References

Baillod, J. (1992). Fluktuation bei Computerfachleuten. Bern: Lang.

Barrick, M.R., & Mount, M.K. (1991). The big five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1-26.

Day, D.V., & Silverman, S.B. (1989). Personality and job performance: Evidence of incremental validity. Per-
sonnel Psychology, 42, 25-36

Day, D.V., Bedeian, A.G., & Conte, J.M. (1998). Personality as Predictor of work-Related Outcomes: Test of a
Mediated Latent Structural Model. Journal of Applied Social Psychology, 28, 2068-2088.

Dick, R. van, Schnitger, Chr., Schwartzmann-Buchelt, C., & Wagner, U. (2001). Der Job Diagnostic Survey im
Bildungsbereich. Zeitschrift für Arbeits- und Organisationspsychologie, 45, 74-92.

Hackman, J.R., &, Oldham, G.R. (1975). Development of the Job Diagnostic Survey. Journal of applied Psy-
chology, 60, 159-170.

Judge, T.A., Locke, E.A., Durham, C.C., & Kluger, A.N. (1998). Dispositional effects on Job and Life Satisfac-
tion: The Role of Core Evaluations. Journal of Applied Psychology, 83, 17-34.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
29

Judge, T.A., Bono, J.E., & Locke, E.A. (2000) Personality and Job Satisfaction: The Mediating Role of Job
Characteristics. Journal of Applied Psychology, 85, 237-249.

Judge, T.A., Heller, D., & Mount, M.K. (2002). Five-factor model of personality and job satisfaction: A meta-
analysis. Journal of Applied Psychology, 87 (3), 530-541.

Lum, L., Kervin, J., Clark, K., Reid, F., & Sirola, W. (1998). Explaining nursing turnover intent: Job satisfac-
tion, pay satisfaction, or organizational commitment?. Journal-of-Organizational-Behavior, 19, 305-
320.

Maier, G.W., & Woschée, R.-M. (2002). Die affektive Bindung an das Unternehmen. Zeitschrift für Arbeits- und
Organisationspsychologie, 46, 126-136.

Michaels, Ch.-E., & Spector, P.-E. (1982). Causes of employee turnover: A test of the Mobley, Griffeth, Hand
und Meglino model. Journal-of-Applied-Psychology, 67, 53-59.

Naquin, S.-S., & Holton, E.-F. (2002). The effects of personality, affectivity, and work commitment on motiva-
tion to improve work through learning. Human-Resource-Development-Quarterly, 13 (4), 357-376.

Seibert, S.E. & Kraimer, M.L. (2001). The Five-Factor Model of Personality and Career Success. Journal of Vo-
cational Behavior, 58, 1-21.

Schallberger, U., & Venetz, M. (1999). Kurzversion des MRS-Inventars von Ostendorf (1990). Zur Erfassung
der fünf „grossen“ Persönlichkeitsfaktoren. Bericht aus der Abteilung Angewandte Psychologie, Psy-
chologisches Institut der Universität Zürich.

Schmidt, F.L. & Hunter, J.E. (1998). The Validity and Utility of Selection Methods in Personnel
Psychology: Practical and Theoretical Implication of 85 Years of Research Findings.
Psychological Bulletin, 124, 262-274.

Tanoff, G.F. (1999). Job satisfaction and personality: The utility of the Five-Factor Model of personality.
Dissertation-Abstracts-International:-Section-B:-The-Sciences-and-Engineering.Vol 60(4-B): 1904.

Tett, R.P., Jackson, D.N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A
metaanalytic review. Personnel Psychology, 44, 703-742.

You, T.-J. (1996). The role of ethic origins in the discriminant approach to employee turnover. Dissertation-
Abstracts-International-Section-A: Humanities-and-Social-Sciences, 56, 4469.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
30

USING TASK MODULE DATA TO VALIDATE


AIR FORCE SPECIALTY KNOWLEDGE TESTS
Shirley Snooks and Cindy Luster
Air Force Occupational Measurement Squadron
Randolph Air Force Base TX, USA
shirley.snooks@randolph.af.mil

Abstract

In support of the Weighted Airman Promotion System, the Air Force Occupational
Measurement Squadron (AFOMS) Test Development Flight (TE), Occupational Analysis Flight
(OA), and career field subject-matter experts (SMEs) work synergistically to produce the best
promotion tests possible, by ensuring the tests are valid, fair, and credible. Although the use of
SMEs and the linking of individual items to Specialty Training Standard (STS) paragraphs
already validate the specialty knowledge tests (SKTs), AFOMS goes one step further and
validates test items by linking them to real world occupational performance data. Traditionally,
the occupational performance data is grouped by duty area, and is then ranked in descending
order by percent members performing (PMP) data, by predicted testing importance (PTI) index
scores, or by a field-validated testing importance (FVTI) index scores. The data is then compiled
into SKT extracts that SKT teams use extensively to determine the best possible item content for
tests. Many TE test psychologists (TPs), OA analysts, and SMEs express concerns about the
time involved and the difficulty in using SKT extracts to translate action-based tasks into
knowledge-based test items.
This research provided selected SKT teams with additional products that organize the
occupational data into logical task module (TM) extracts based on co-performance, and assesses
the appeal of the TM extracts versus the appeal of the SKT extracts.
It is the belief of these researchers that SMEs will report more satisfaction with the TM
data because the tasks will be logically grouped by co-performance rather than general duty area.
In addition, SME comments will be encouraged in order to explore additional approaches of
presenting task information.

USING TASK MODULE DATA TO VALIDATE AIR FORCE SPECIALTY


KNOWLEDGE TESTS
In an innovative approach the AFOMS TE and OA Flights are working synergistically to
provide SMEs, TDY to AFOMS, with an alternative organization of occupational data to
facilitate SKT item writing by making the data easier to understand and more logical. Ongoing
criticism, provided by AFOMS psychologists, analysts and SMEs, speaks to the difficulty in
using performance tasks to develop questions on a knowledge test.
“The Air Force Comprehensive Occupational Data Analysis Program (CODAP) system
is a collection of analysis tools and procedures which use, as raw material, information provided

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
31

by members of the occupational field being studied (Thew and Weismuller, 1979)”. It is
designed to furnish users with a wide variety of reports that facilitate the identification of
individual and group characteristics and the detection of job similarities and differences. For
over 35 years, AFOMS has analyzed career fields and provided occupational data to Air Force
managers for use in decision-making. The analysis of work performed within career fields and
the demographic characteristics of the members performing this work has remained the
stronghold for objective and quantitative occupational data for managers to use for personnel and
training decisions for years.
These researchers will apply a currently available but seldom used CODAP analysis
program. This less common approach defines work by grouping tasks into TMs. With these
TMs, natural groupings of performance tasks can be provided to SMEs TDY to AFOMS. These
teams will compare the tasks organized by co-performance (TM extract) with SKT extract data
organized by major duty area and sorted by PMP, FVTI, or PTI index scores. It is the opinion of
these researchers that organizing performance data into meaningful groups (i.e., TMs) will
greatly aid in alleviating some of the concerns and the problems SMEs have with using typical
SKT extract data and will allow SMEs to make a more intuitive leap from performance-based
tasks to knowledge-based test questions.

Co-Performance
The idea of grouping tasks, by co-performance, into TMs has been discussed within the
research arena since the mid 1980s. For example, the Training Decisions System (TDS) was
conceived as a computer-based training requirements planning and decision support system
developed to meet Air Force needs for better decision information (Vaughan, Mitchell, Yadrick,
Perrin, Knight, Eschenbrenner, Rueter and Fledsott, 1989). The TDS allows managers and
analysts to evaluate policy options in terms of costs and training capacities of representative units
and to conduct trade-off analyses between various formal training programs and on-the-job
training (Mitchell, Knight, Budkenmeyer, and Hand, 1989). The TDS supports Air Force
managers in making decisions as to the what, where, and when of technical training (Ruck, 1982)
by using AFS-specific occupational survey task data as a starting point for training development
decisions. Task clustering is used in the TDS to capture the economies of training different tasks
at the same time, either because of common skills and knowledge, including perhaps shared
equipment, or because the tasks are generally performed by the same people.
AFOMS analysts have played an integral part in providing task data to support these
training systems since the inception of the program. Task modular data was a critical source of
information for program functionality.

Occupational Data for SKT Development


Of particular importance to SKT teams is a specially compiled SKT extract containing
occupational survey data specific to test populations. AFOMS survey data provides test writers
(SMEs) with PTI index scores that are derived from PMP, PTS, task difficulty (TD), and training
emphasis (TE) data. Those tasks rated highest in FVTI or PTI also tend to be high in all four of
the primary indices (PMP, PTS, TD and TE), and are exactly the kinds of tasks one would
generally consider job-essential and therefore appropriate for SKT content.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
32

When possible, FVTI data are produced for SKT revisions. To obtain FVTI data,
approximately 6 months before the start of the SKT development project a sample of 100 senior
career field NCOs is sent a survey containing a list of the 150-200 tasks rated highest in PTI.
Respondents are asked to provide a seven-point Likert scale rating (“1” is least important and
“7” is most important) of how important they believe the task is for coverage on an SKT. The
responses are averaged for each task, yielding the FVTI index, a direct measure of the opinions
of career field experts as to what constitutes job-essential knowledge.
Two separate data sets are prepared, one for use in the development of an E-5 SKT and
one for use in the development of an E-6/7 SKT. Regardless of whether PTI or FVTI data are
provided to the SKT team, the data provides a restricted set of tasks for use in SKT construction
that will discriminate between the most knowledgeable and the least knowledgeable workers
(Pohler, 1999).

Participants, Design, and Procedure


Forty SMEs TDY to AFOMS for SKT major development projects with start dates from
5 August through 7 October 2003, from a total of 10 Air Force career fields, participated in this
research. These career fields are as follows:

AFSC 1C3X1 Command Post


AFSC 1S0X1 Safety
AFSC 2A5X3D Integrated Avionics Systems (Airborne Surveillance Radar
Systems)
AFSC 2E1X2 Meteorological and Navigation Systems
AFSC 2E1X4 Visual Imagery and Intrusion Detection Systems
AFSC 2T0X1 Traffic Management
AFSC 3C0X2 Communications/Computers Systems Programming
AFSC 3C3X1 Communications-Computer Systems Planning and Implementation
AFSC 3N0X2 Radio and Television Broadcasting
AFSC 3U0X1 Manpower

In addition, 13 full-time AFOMS TPs who conducted these SKT development projects
participated in this research.
Occupational data were retrieved from archived storage and copied into new CODAP
system files for this study.
Originally, these researchers planned to use the top 150 to 200 tasks determined by the
PTI and FVTI values for each AFSC. However, after much effort and consternation, it became
apparent that it was neither cost nor time effective to convert only the top 150-200 tasks into
readable and useable CODAP files. Subsequently, the TM clustering program was applied to the
complete task lists for these AFSCs.
Once task clusters were identified as being different from one another, TASSET, another
CODAP program, extracted an asymmetric matrix of percentage values indicating the degree to
which each task is co-performed with each other cluster. For example, if a cluster consists of
tasks “A”, “B”, “C”, and “D”, the average co-performance of “A” with “B”, “A” with “C”, and
“A” with “D” is determined. Tasks within each cluster were then ordered from highest co-

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
33

performance value to lowest through the use of PRTFAC, a TASSET output. Those tasks with
the highest co-performance values are most representative of the TM (Phalen, Staley, and
Mitchell, 1987). In addition, the PRTFAC allows an analyst to better characterize and name
each TM, discern finer distinctions between tasks, and create new TMs if necessary. These
adjustments can be manually input into GRPMOD (another TASSET output), re-analyzed and
readjusted until a final GRPMOD TM listing (TM extract) is produced.
Previous research by Snooks and Luster (2003) indicated that labeling task modules
added no measurable benefit; therefore, none of the task modules were named for this study.
Instead, unlabeled TMs were provided for each team that showed tasks grouped by co-
performance only (TM extracts).
As customary, SKT development teams were provided with SKT extracts during the first
week of each project. The SMEs began using the SKT extract data in the first week of each 5-
week major revision project by using the data for outline development in accordance with TE
process standards, and then continued using the SKT extracts throughout item development to
validate test items. The TM extracts were presented to the teams between week 2 and week 4 of
the projects, after the teams were familiar with the SKT Extract. Each team member was asked
to review the TM extract and then complete a 10-item, 7-point Likert scale survey (see
Attachment 1). OMS TPs assigned to these projects were also asked to review the TM extracts
and complete a survey, identical to the TP survey, except for the heading (see Attachment 2). In
order to avoid tainting the responses by the enthusiasm of the researchers, and to ensure standard
survey administration, a cover letter (see Attachment 3) was developed that included a brief
explanation of the two survey products and directions for completing the survey. In other words,
the survey was designed for self-administration.

Results
In simple mean comparison tests: the perceived “importance” of the SKT extract was
compared with the perceived “importance” of the TM extract; the perceived “accuracy” of the
SKT extract was compared with the perceived “accuracy” of the TM extract; the perceived “ease
of understanding” the SKT extract was compared with the perceived ‘ease of understanding” the
TM extract; the perceived “ease of using” the SKT extract was compared with the perceived
“ease of using” the TM extract; and the “desire to use” the SKT extract was compared with the
“desire to use” the TM extract. In addition, both groups (SMEs and TPs) were encouraged to
provide comments.
No statistically significant differences were found, however some comparisons can be
made. All respondents (N = 53) appeared to be slightly more positive about the SKT extract
(Mean = 5.1811) than the TM extract (Mean = 4.8604) on all five examined attributes, with the
“desire to use” attribute rated the highest of all (Mean = 5.8679) for the SKT extract and the
“ease of use” attribute rated the lowest (Mean – 4.5660) for the SKT extract. The highest rating
for the TM extract, from all respondents, was for the “importance” attribute (Mean = 5.1698),
while the attribute with the lowest rating from all respondents was “ease of use” (Mean =
4.4339). It is worth noting that the “ease of use” attribute for both products has the lowest
ratings, indicating that it may be the data itself that is difficult to use.
TPs (N = 13, Mean = 5.5000) appeared to be slightly more positive towards both
products on all five examined attributes than the SMEs (N = 40, Mean = 4.4688), with the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
34

greatest difference appearing in the “ease of use” attribute (TP Mean = 5.0384, SME Mean =
4.3250) and the least difference appearing in the “importance” attribute (TP Mean = 5.8462,
SME Mean = 5.2375). Again, there were no statistically significant differences between the TP
and SME scores.
Interestingly, these results do not appear to be supported by respondents’ comments in
which both products, for the most part, were criticized as not being presented in a format they are
familiar with and thus, not being easy to use (see Attachment 4).

Table 1, All respondent mean score comparisons with T-Values

TM Extract
Attribute SKT Extract Mean T-Value
Mean
Importance 5.6038 5.1698 0.0122
Accuracy 4.8868 4.6792 0.2578
Ease of Understanding 4.9811 4.9245 0.7614
Ease of Use 4.5660 4.4339 0.5950
Desire to Use 5.8679 5.0943 0.0244

Table 2, SME mean score comparisons with T-Values

TM Extract
Attribute SKT Extract Mean T-Value
Mean
Importance 5.0000 4.6250 0.0483
Accuracy 4.6875 4.0625 0.1561
Ease of Understanding 4.3125 4.5000 0.2980
Ease of Use 3.8125 3.6875 0.2663
Desire to Use 5.1250 4.8750 0.0265
Table 3, TP mean score comparisons with T-Values

TM Extract
Attribute SKT Extract Mean T-Value
Mean
Importance 5.0769 5.6153 0.0820
Accuracy 5.0000 5.1538 0.5486
Ease of Understanding 4.9231 5.3846 0.2132
Ease of Use 4.8461 5.2307 0.5221
Desire to Use 6.4615 6.3076 0.6872

Conclusion and Implications


In summary, the comments and results of this research reflect the need for more research.
A review of TP and SME comments provided a common theme – subjects perceive difficulty in

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
35

comparing the two products; perhaps, since the presentation of PMP, PTI, and PTVI data was not
available on the TM extracts. In spite of these reported difficulties, these researchers feel the use
of TM data in SKT development merits continued research. A follow-on study is planned for
early 2004, in which both the SKT and TM extracts will be converted into spreadsheet format
and hand massaged before being provided to future SKT teams (TPs and SMEs). If this follow-
on study is successful, the current study will then be replicated. Furthermore, as mentioned in a
previous study (Snooks and Luster, 2003), several important issues are still unresolved. For
example, the decision must be made as to who will determine what tasks comprise what module
(i.e., OA analysts, TE psychologists, or SMEs). If, after further research, management
determines that TM extracts should be run off the top PTI/FVTI tasks, rather than off the whole
task list as was done in this research, allocation of additional funds and resources will have to be
allotted in order to provide automated data runs conducive to CODAP. Discussion and
modifications to the TM extract “final product” format remains to be decided and then
incorporated into future research efforts.

Author’s Note

Special thanks and acknowledgement go to Ms. Jeanie C. Guesman, AFOMS,


Occupational Analysis Flight, for her diligent and enthusiastic programming support. This
project could not have gone forward without her help.
Comments and questions about this research can be addressed to Ms. Shirley Snooks, at
AFOMS/TEEQ, 1550 Fifth Street East, Randolph AFB TX 78150. Calls can be made to
Ms. Snooks at (210) 652-5013, extension 3114, or DSN: 487-5013, extension 3114.
In addition, comments and questions about this research can be addressed to Ms. Cindy
Luster, at AFOMS/OAE, 1550 Fifth Street East, Randolph AFB TX 78150. Calls can be made
to Ms. Luster at (210) 652-6811, extension 3044, or DSN: 487-6811, extension 3044.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
36 Attachment 1
SME Questionnaire

AFSC ______________________________NAME ______________________________DATE ______________________________


Read the question, then, in the space next to the question number, mark the number that best corresponds to your opinion.

1. How important is the SKT Extract?

1 2 3 4 5 6 7
Absolutely Neither Important Absolutely
Unimportant Nor Unimportant Important

2. How important is the Task Module document?

1 2 3 4 5 6 7
Absolutely Neither Important Absolutely
Unimportant Nor Unimportant Important

3. How accurate is the SKT Extract?

1 2 3 4 5 6 7
Absolutely Neither accurate Absolutely
Inaccurate Nor Inaccurate Accurate

4. How accurate is the Task Module document?

1 2 3 4 5 6 7
Absolutely Neither accurate Absolutely
Inaccurate Nor Inaccurate Accurate

5. How easy is it to understand the SKT Extract?

1 2 3 4 5 6 7
Extremely Neither Easy Extremely
Difficult Nor Difficult Easy

6. How easy is it to understand the Task Module document?

1 2 3 4 5 6 7
Extremely Neither Easy Extremely
Difficult Nor Difficult Easy

7. How easy will it be to use the SKT Extract to link test items to tasks?

1 2 3 4 5 6 7
Extremely Neither Easy Extremely
Difficult Nor Difficult Easy

8. How easy will it be to use the Task Module document to link test items to tasks?

1 2 3 4 5 6 7
Extremely Neither Easy Extremely
Difficult Nor Difficult Easy

9. How often will you use the SKT Extract to link test items to tasks?

1 2 3 4 5 6 7

10. How often will you use the Task Module document to link test items to tasks?

1 2 3 4 5 6 7
Not at all Half of the time All of the time

Comments:

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
Attachment 2 37

TP Questionnaire

AFSC ___________________NAME ______________________________DATE ____________________ PROJECT DATE____________________

Read the question, then, in the space next to the question number, mark the number that best corresponds to your opinion.

1. How important is the SKT Extract?

1 2 3 4 5 6 7
Absolutely Neither Important Absolutely
Unimportant Nor Unimportant Important

2. How important is the Task Module document?

1 2 3 4 5 6 7
Absolutely Neither Important Absolutely
Unimportant Nor Unimportant Important

3. How accurate is the SKT Extract?

1 2 3 4 5 6 7
Absolutely Neither accurate Absolutely
Inaccurate Nor Inaccurate Accurate

4. How accurate is the Task Module document?

1 2 3 4 5 6 7
Absolutely Neither accurate Absolutely
Inaccurate Nor Inaccurate Accurate

5. How easy is it to understand the SKT Extract?

1 2 3 4 5 6 7
Extremely Neither Easy Extremely
Difficult Nor Difficult Easy

6. How easy is it to understand the Task Module document?

1 2 3 4 5 6 7
Extremely Neither Easy Extremely
Difficult Nor Difficult Easy

7. How easy will it be to use the SKT Extract to link test items to tasks?

1 2 3 4 5 6 7
Extremely Neither Easy Extremely
Difficult Nor Difficult Easy

8. How easy will it be to use the Task Module document to link test items to tasks?

1 2 3 4 5 6 7
Extremely Neither Easy Extremely
Difficult Nor Difficult Easy

9. How often will you use the SKT Extract to link test items to tasks?

1 2 3 4 5 6 7
Not at all Half of the time All of the time

10. How often will you use the Task Module document to link test items to tasks?

1 2 3 4 5 6 7
Not at all Half of the time All of the time

Comments:
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
39

Attachment 3

Cover Letter

Dear SME or TP:

We are asking for your help in providing a better product for use in matching Occupational Survey Report (OSR) tasks
to Specialty Knowledge Test (SKT) items.

Your assistance in this effort is completely voluntary. In addition, your privacy will be protected and no personal
information will be made public. Please read and sign your name in the appropriate place below.

I agree to participate in this research. ____________________________


(Signature)

____________________________
(Printed Name)

Please read the following paragraphs, review the two OSR task documents and then complete the questionnaire about
these documents. Results from this questionnaire may affect how OSR tasks are presented to SKT teams in the future
so please read and respond carefully and thoughtfully.

Document 1 is the traditional SKT Extract that has been used to validate test items at AFOMS for some time. The
SKT Extract contains the top 150 to 200 tasks identified in the OSR and presents these tasks in several different ways.
It is sorted first by rank (i.e. E-5 and E-6/7), then by duty area (i.e. A, B, and C), then by percent members performing
(PMP). The SKT Extract also lists a derived testing importance (TI), (either field-validated (FVTI), or predicted (PTI)
value for each task. Finally, the SKT Extract provides a task listing in task number order that shows the PMP, and TI
value across ranks.

Document 2 is a task module listing that is currently used by AFOMS occupational analysts to help determine career
field training needs and requirements. This document lists all tasks identified in the OSR, however it is sorted by task
module (tasks that are co-performed). For example in the career field of cooking, one task module might be cake
baking. Tasks within the cake baking task module include adding ingredients, mixing ingredients, greasing and
flouring cake pan, turning on oven, pouring mix into cake pan, putting cake into oven, setting timer and so on.

Please read each question, then circle the number that best corresponds with your IMMEDIATE reaction. If you have
any questions, please have your TP contact me.

Thank you,

Shirley Snooks
Personnel Psychologist
AFOMS/TEEQ

Cindy Luster
Occupational Analyst
AFOMS/OAV

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
40

Attachment 4

Respondent Comments
“If you go to the Task Module, the format should incorporate the tasks from the STS as headers to align
information under. Also, the task number from the STS should be reflected to make it easier to associate the
STS with the Task Module task number; all of which has to be transcribed over to the item record card.”

“Recommend linking items listed in the SKT Extract and Task Module to the STS. I.e. List each process under
the appropriate STS heading (and in the same order).”

“The tasks should actually be aligned with the STS. If they were initially aligned this way, then it would make
the referencing much easier.”

“The SKT extract and Task Module would both be valuable documents if they were re-arranged to flow logically
with the STS.”

“Task Module was nearly useless because of the difficulty of linking test areas to the tasks offered. The intent is
very good. Solution: Plan the survey with actual CDC areas in mind, by paragraph number. Will reduce linking
time by probably 75%.”

“The Task Module in our career field needs updating. Some things need to be added as there was no appropriate
task.”

“I would like to see SKT Extract as computer based to save research time. Using root words as a search engine
on the computer would accurately match items for the SME.”

“In the Task Module, we do not know the task category since it has only numeric headings. If all processes can
be identified (as these headings) then the task module will be easier to use. Also, ordering should not be by verb,
rather it should list nouns e.g., “The Shipment Planning Process”. How about that?”

“The SKT is very useful in determining what personnel in career field are doing. However some items listed on
survey were difficult to relate to test items: wording of some tasks were very broad. Perhaps a team of SMEs
could refine wording of survey questions/items. SMEs do their best to inform personnel of importance of
surveys, but we need to enlist help of unit training managers to stress importance. Some unit training managers
do not do a very good job of spreading word.”

“Logical grouping of tasks in task module does not fit “tree” structure of groupings I am comfortable and
familiar with. Tree structure identifies major broad task areas as major branches with individual smaller groups
and ultimately individual tasks stemming from the major broad task areas. This task module data might be more
usable if the modules were sorted in recognizable groups similar to the way they are organized in alignment with
AFOSH and Mishap prevention programs.”

“I believe the current application of linking the SKT and career field task is the best method in developing test
questions. However, the Extract could be more useful if the information received was based on the current
operation task and time period. For example, when a major rewrite is scheduled, the Measurement Squadron
must ensure the information is both valid and up-to-date.”

“The sample Task Module seemed easier to read. However, there were some clusters that were not accurate.
May need an SME to help insure accuracy of clusters. With both, more in depth descriptions of the entries and
what they are used for is needed.”

“I like the Task Module Data presentation/outline. I think by grouping the tasks together it gives the SME a
better look at which task may be the best “true” fit. In both documents there were a few items that we had to
make a best fit because there was no exact match. Overall, both documents are very helpful.”

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
41

“The PMP and FVTI data is important to determine if questions need to come from certain task numbers. If the
Task Module document and/or the SKT Extract were linked more closely to the STS, linking test items to tasks
would be easier. Overall, I like the SKT Extract better than the Task Module document.”

“The SKT Extract should match up better with the CFETP tasks. Most work centers don’t have access to the
SKT Extract information because they don’t know it exists. Information is derived from surveys distributed to
the field, but technicians don’t understand the importance of the survey, or how it affects the entire career field.”

“I understand the goal is to present SKT Extract information in the best possible format for SME use. First, the
SKT Extract and Task Module should be linked to the STS by STS task #. At this time the STS and SKT
Extracts are independent documents. Of these two documents, the STS Extract is the best. Why not keep the
STS Extract format and sort tasks as modules?”

“Not a great tool, but better than the SKT Extract. Ideally an electronic version with keyword sequence
capabilities is needed. The SKT Extract and/or the Task Module would be easier to use if it correlated to the
STS.”

“The method used to develop the Occupational survey should have outlined the current CFETP and the current
CDCs. With minimal correlation it is difficult to use any document to assist and align test questions. This faulty
comparison makes it simply another task for SMEs with minimal value added to the process.”

“”Task module document assumes “that if incumbents perform Task A and Task B, there is high likelihood the
two tasks share common skills and knowledge and thus can be trained together”, which is not correct. Aligning
the task groupings to the most current STS seems to be a more logical method of grouping tasks.”

“Areas of general information headings (i.e., Radio Production Skills, Writing Broadcast Products, etc.) not
evident on Task Module. It would take additional time to look for information w/o headings.”

“I do not feel the task list is 100% comprehensive, however I do understand this is a product of a sample of
workers. I hope our 3-level tech school uses the task analysis to build their training program.”

“SKT Extract should eliminate some items that aren’t necessary, for example: - PFE/Supervisor type items
shouldn’t be placed in SKT Extract (even though it does need to be asked during surveys). – Any remove/replace
tasks should be removed from the extract for SKT development – no SKT test questions will cover simple
remove/replace tasks.”

“Task Module usefulness is unknown without actual data, but compared to SKT Extract, it seems much easier to
use the logical sequence of the TM EXTRACTD.”

“The module data for tasks seems to be easier to follow. With a little bit of training on it, it should be no
problem at all.”

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
42

References

Institute for Job and Occupational Analysis (1994). Training Impact Decision &
Evaluation System for Air Force Career Fields. TIDES Operational Guide.

Mitchell, J.L., Buckenmeyer, D.V., and Hand, D.K. (1989). The Use of CODAP in
the Training Decision System

Phalen, W.J., Staley, M.R., and Mitchell, J.L. (1987) New ASCII CODAP Programs
and Product for Interpreting Hierarchical and Nonhierarchical Clusters. Proceedings of the 6th
International Occupational Analysts’ Workshop, San Antonio, TX: USAF Occupational
Measurement Squadron.

Pohler, W.J. (1999). Test Content Validation – New Data or Available Data.
Proceedings from the 11th International Occupational Analysts Workshop, San Antonio, TX:
USAF Occupational Measurement Squadron.

Ruck, H.W. (1982). Research and Development of a Training Decisions System.


Proceedings of the Society of Applied Learning Technology. Orlando, FL.

Snooks, S.F. and Luster, C. (2003) Occupational Analysis: Grouping Performance


Tasks Into Task Modules for Use in Test Development Proceedings from the 13th
International Occupational Analysts Workshop, San Antonio, TX: USAF Occupational
Measurement Squadron.

Thew, M.C. and Weissmuller, J.J. (1978). CODAP: A new modular approach to
occupational analysis. Proceeding of the 20th Annual Conference of the Military Testing
Association, Oklahoma City, OK, 362-372.

Vaughan, D.S., Mitchell, J.L., Yadrick, R.M., Perrin, B.M., Knight, J.R.,
Eschenbrenner, A.J., Rueter, F.H., and Fledsott, S. (1989). Research and Development of the
Training Decisions System (AFHRL-TR-88-50).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
44

THE SCOPE OF PSYCHOLOGICAL TEST SYSTEMS WITHIN THE


AUSTRIAN ARMED FORCES
Dr. Christoph Brugger
Austrian Armed Forces Military Psychology Service
Vienna, Austria
hpa.hpd@bmlv.gv.at

ABSTRACT
During his or her career, a soldier in the Austrian Armed Forces has to pass several
psychological tests. Since Austria will also be involved in supporting future WEU missions,
changes in preconditions are reflected in the test systems applied. The integration of existing
and planned selection systems into the typical career, beginning from recruitment to peace
support operations or pilot selection is shown.
Some more detail will be offered concerning the computer based test systems used in the
induction centers as well as problems arising with harmonization of selection for peace
support operations and selection of forces earmarked for international operations.

INTRODUCTION
Most papers on testing cover specific methodological issues or aspects of individual tests, but
almost never the integration of test systems within an organization is shown in detail. Even
though many people are interested in such aspects and often ask questions on how such
systems are implemented in the individual countries, there exists very little information in this
field. Based on these observations, the present state as well as some related problems within
the Austrian Armed Forces will be presented.

FACTS ABOUT THE AUSTRIAN ARMED FORCES


For better understanding of the following some facts about Austria and the Austrian Armed
Forces are presented first:
• Austria is a small country. To reach the necessary strength Austria’s defense is based on a
conscript and militia /reserve system.
• Military service in Austria is compulsory. Every young man at an age of about 18 has to
visit one of the six induction centers, where after a detailed examination a commission
decides on whether he is fit for military service or not.
• Persons not fit for the army will drop out of this system. All others have the option to
chose alternative service with a civilian Non Profit Organization. Alternative service lasts
significantly longer than military service to limit excessive losses of recruited personnel.
• Military service lasts either 8 months, or 7 months plus 3 times 10 days of reserve recalls,
usually every two years.
• Austria has a long tradition of participation in Peace Support Operations of the United
Nations. PSO personnel consists of volunteers only, professional soldiers as well as
members of the militia.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
45

• Austria also has committed itself to supporting future missions of the European Union.
Therefore specially prepared and trained units – again based on volunteers – have to be
raised.

PSYCHOLOGICAL TEST SYSTEMS


All these aspects have to be regarded when deciding on where and how to implement
psychological test systems for the Armed Forces.
According to our current specifications testing and selection is applied with
• recruits at the induction centers
• soldiers applying to become cadre / professional soldiers
• soldiers applying for service with International Operations Forces
• professional soldiers as well as members of the militia to select PSO personnel
• soldiers applying or intended for special functions like air defense personnel, pilot etc.

TESTING AT THE INDUCTION CENTERS


A young man officially gets in contact with the Austrian Armed Forces for the first time when
he has to take part in the examinations at an induction center. There are six induction centers
in Austria, examining about 60.000 persons a year.
The examination lasts one and a half days, with an information section plus medical and
psychological tests on the first day and a psychological interview as well as a medical
examination on the second day. Based on the data collected a commission consisting of an
officer, a physician (usually a medical officer) and a psychologist decides, on whether the
draftee is fit for military service or not.
Regarding the amount of people to be tested and the limited time available, the test system
implemented at the induction centers is computer based, including some adaptive tests as well
as self-adapting test batteries. Adaptive testing is optimized to yield highest precision in the
lower ability range, since persons with low abilities belong to those possibly unfit for military
service.
To provide information for improved placement besides registering verbal abilities, reasoning
and spatial performance as well as perceptual speed, also tests covering psychomotor skills
using tracking tasks and anticipation of movement have been included. Conditions imposing
additional stress in the second half of the test battery provide information on performance
under load.
An additional life event inventory and a recruitment-specific personality questionnaire are
also part of the test battery.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
46

To support the psychological interview, which is an important part of the selection procedure,
printouts with detailed test results of each candidate are produced. The test results are
available online as well. If there is just a slight characteristic indicating some problem, the
candidate is interviewed by one of the two psychologists that are at each of the induction
centers. This happens in about half of the cases, and the other half is interviewed by specially
trained non-academic personnel.

ERGOPSYCHOMETRY
All of the following test systems have one thing in common: They in some way also make use
of the concept of Ergopsychometry, a concept developed by Guttmann and others at the
University of Vienna: testing under neutral conditions followed by testing under appropriate
load conditions. It has been shown in various fields that the change of performance measured
allows a surprisingly reliable prognosis of performance under real life load conditions. As
stress resistance is obviously an important aspect in a soldiers daily life, it is also one of the
main dimensions our tests are supposed to reveal, and therefore Ergopsychometry is the
method of choice.

SELECTION OF FUTURE CADRE


The Cadre Aptitude Test is the next hurdle young soldiers applying to become career officers
or NCO have to pass. In general this test will come up about one or one and a half years after
passing the tests in the induction centers, that is after about half a year of military service.
With female applicants the Cadre Aptitude Test immediately follows the basic entrance exam
at an induction center.
This test should not only provide information on cognitive abilities, necessary to successfully
complete courses on schedule, but should also cover the necessary basic aptitude for serving
abroad as well as at home. While participation in international operations right now is still
based on volunteers, both cadre and militia, the next generation of cadre soldiers will
probably have to commit themselves to service abroad. Based on long-term experience in
PSO selection and other selection tasks and supported by very proficient officers and NCOs,
both of the Austrian International Peace Support Command and of homeland troops, it was
decided to use a mix of psychometric tests and assessment elements in the Cadre Aptitude
Test to cover all relevant psychological aspects. These also include social competence
(registered by assessing for example communication skills and conflict behavior), planning
and organizational skills, motivational parameters, etc. The soldiers are tested on their ability
to perform their duties also under stress, and not to be a potential danger to themselves or
others.
The Cadre Aptitude Test starts at 0930 and continues after a lunch break until 1700. Then
tests of physical condition follow, including a swimming exam, a 16 km hike at night, again
followed by another block of assessment elements and tests under load, a total of almost 24
hours without sleep. This is completed by a final personal interview by a psychologist.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
47

INTERNATIONAL OPERATIONS FORCES ENTRANCE EXAM


As Austria also has committed itself to supporting future missions of the European Union,
specially prepared and trained units – also still based on volunteers – have to be raised. To
become part of such a unit, also members of the existing cadre who are not yet part of
International Operations Forces have to pass an examination. This exam is shorter than the
one for young soldiers, as there is already more information available on existing armed
forces personnel. But to check the current psychological state some testing is necessary.
To reveal possible impairment this test system again combines the ergopsychometric
approach based on tests (for example of memory and concentration), with assessment
elements, questionnaires and a final personal interview by a psychologist.

SELECTION PROCEDURE FOR PEACE SUPPORT OPERATIONS


Except for soldiers who successfully passed a test covering the aptitude for serving abroad
within the last three years, all volunteers for PSO, professional soldiers as well as members of
the militia, have to pass a test on medical, physical and psychological fitness. Our concept of
combining standard procedures with the so called „shelter test“ lasting all night with group
dynamic tasks under stressful conditions, followed by a final personal interview by a
psychologist has already been presented by Slop at the 42nd IMTA conference in 2000.
All of the last three test systems mentioned also cover the aptitude for serving abroad, but not
all are administered under the same command. They were developed for subjects with
differing backgrounds, and the time frames, within which test result had to be refreshed, used
to be shorter for PSO personnel. So some difficulties were encountered trying to harmonize
the already existing systems, but it was necessary to reach some kind of mutual agreement.
Now the results of all of these three test systems concerning aptitude for serving abroad are
valid for three years. Members of units earmarked for international operations have to
participate in a short check on the current physical and psychological state once a year, as
they might get deployed any time. If one wants to join a unit earmarked for international
operations and tests were more than one year ago, a short check will also be necessary.

TESTS FOR SPECIAL FUNCTIONS


There are also specific test systems for pilot candidates or applicants for special units, where
particular abilities are necessary. While all of the test systems described before are based on
selecting out persons with significant shortcomings, the tests for specific functions are
designed to select the cream of the crop. Numbers of subjects tested per year for such
functions range from about 400 candidates in the pilot pre-selection down to 50 for some very
specific jobs. In most cases physical or medical checks have to be passed before entering the
psychological exam.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
48

CONCLUSION
During his or her career, a soldier in the Austrian Armed Forces has to pass several
psychological tests. Each of the test systems applied was designed taking into account the
specific requirements of the function as well as the number and background of subjects to be
tested.
One might question whether the effort taken to get reliable and valid data, and the strain and
stress imposed on the applicants is still reasonable. Acceptance is quite good: even in the
higher ranks, among officers polls at several terms yielded numbers of more than 80 % up to
100 % of positive judgement. And especially within a military organization with a small
budget, high quality of personnel selection can help to bridge some of the gaps existing.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
49

JOB-SPECIFIC PERSONALITY ATTRIBUTES AS PREDICTORS OF


PSYCHOLOGICAL WELL-BEING
Dr. H. Canan Sumer, Dr. Reyhan Bilgic, Dr. Nebi Sumer, and Tugba Erol, M.S.
Middle East Technical University
Department of Psychology
06531 Ankara, Turkey
hcanan@metu.edu.tr

ABSRACT
The purpose of this study was to examine the nature of the relationships between job-specific
personality dimensions and psychological well-being for noncommissioned officers (NCOs)
in the Turkish Armed Forces (TAF). A job-specific personality inventory, comprising
measures of 11 personality dimensions (i.e., military bearing, determination, dependability,
orderliness, communication, self-discipline, self-confidence, agreeableness, directing and
monitoring, adaptability, and emotional stability) was developed for selection purposes. The
inventory was administered to a representative sample of 1428 NCOs along with a general
mental health inventory developed by the authors, which consisted of six dimensions of
psychological well-being: depression, phobic tendencies, hostility, psychotic tendencies,
psychosomatic complaints, and anxiety. Exploratory and confirmatory factor analyses
suggested existence of a single factor underlying the six psychological well-being dimensions,
named Mental Health, and two latent factors underlying the 11 personality dimensions, named
Military Demeanor and Military Efficacy. The Mental Health factor was regressed on the two
personality constructs using LISREL 8.30 (Jöreskog & Sörbom, 1996). The two personality
constructs explained 91 percent of the variance in the Mental Health construct. A stepwise
regression indicated that beta weights of the personality measures were significant except for
military bearing, orderliness, and dependability. Results suggested that job-specific
personality attributes were predictive of mental health. Implications of the findings for the
selection of NCOs are discussed.

Paper presented at the International Military Testing Association 2003 Conference, Pensacola,
Florida. Address correspondence to H. Canan Sumer, Middle East Technical University,
Department of Psychology, 06531 Ankara, Turkey. Send electronic mail correspondence to
hcanan@metu.edu.tr. This study is a part of a project sponsored by the Turkish Armed
Forces.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
50

INTRODUCTION
Past two decades have witnessed the resurgence of individual differences variables,
especially personality attributes, in a variety of human resources management applications.
An important credit in this movement rightfully goes to the five-factor model of personality
(i.e., Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and
Neuroticism) (Costa & McCrae, 1985; Goldberg, 1990), which stimulated a large quantity of
both empirical and theoretical work on the relationships between personality variables and a
number of outcome variables, performance being the most widely studied one. Recent
literature suggests that personality predicts job performance, and that validities of certain
personality constructs, such as conscientiousness or integrity, generalize across situations
(e.g., Barrick & Mount, 1991; Borman, Hanson, & Hedge, 1997; Hogan, Hogan, & Roberts,
1996; Hough, Eaton, Dunnette, Kamp, & McCloy, 1990; Ones, Viswesvaran, & Schmidt,
1993; Salgado, 1997). In a recent meta analysis of 15 meta analytic studies, Barrick, Mount,
and Judge (2001) found that among the Big Five traits, Conscientiousness and Emotional
Stability were valid predictors of performance in all occupations, whereas the other three
traits predicted success in specific occupations. Empirical evidence also suggests that
different facets of performance have different predictors, and attributes that lead incumbents
to do well in task performance are different from those that lead incumbents to do well in
contextual aspects of performance (e.g., McCloy, Campbell, & Cudeck, 1994; Motowidlo &
Van Scotter, 1994; Van Scotter & Motowidlo, 1996). Motowidlo and Van Scotter reported
that although both task performance and contextual performance contributed independently to
overall job performance, personality variables were more likely to predict contextual
performance than task performance.
The relationships of personality variables with outcome variables other than job
performance have also received research attention. For example, Schneider’s Attraction-
selection-attrition (ASA) model lends support for the criticality of personality in both
attraction of potential candidates to the organization and turnover process (Schneider, 1987;
Schneider, Goldstein, & Smith, 1995). The model states that individuals are attracted to,
selected by, and stay with organizations that suit their personality characteristics. The major
assumption of the ASA model is that both the attraction and retention processes are based on
some kind of person-environment (i.e., organization) fit.
Personality and Psychological Health in the Military Context
Personality variables have also been considered in the selection of military personnel.
Specific personality characteristics or personality profiles have been shown to be associated
with desired/undesired outcomes in the military settings (e.g., Bartram, 1995; Sandal,
Endresen, Vaernes, & Ursin, 1999; Stevens, Hemstreet, & Gardner, 1989; Stricker & Rock,
1998; Sumer, Sumer, & Cifci, 2000; Thomas, Dickson, & Bliese, 2001). Furthermore, in
addition to job-related personality variables, psychological well-being or mental health has
been among the individual differences factors considered in the selection/screening of military
personnel (e.g., Holden & Scholtz, 2002; Magruder, 2000). As stated by Krueger (2001),
compared to most civilian jobs, military jobs involve much more demanding physical and
psychological conditions, such as fear, sensory overload, sensory deprivation, exposure to
extreme geographies and climatic temperatures, and like. These conditions call for
individuals with not only physical but also psychological stamina. According to Cigrang,
Todd, and Carbone (2000), mental-health-related problems play a critical role in a significant
portion of the turnover/discharge within the first six months of enlistment in the U.S. Armed
Forces. Holden and Scholtz (2002) used Holden Psychological Screening Inventory (HPSI)
for predicting basic military training outcome for a sample of noncommissioned recruits in the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
51

Canadian Forces. Results indicated that the Depression scale of the inventory was predictive
of training attrition, yielding support for the use of the inventory as a screening tool.
The Relationship Between Personality and Mental Health
The line between job-related personality attributes and psychological well-being is not
always clear. According to Rusell and Marrero (2000), “[personality] styles mirror the traits
that, in extreme forms, are labeled disorder.” These authors almost equate personality style
with psychological/mental health or well-being. However, we believe that although these two
constructs are closely associated, at the conceptual level a distinction needs to be made
between personality style and overall psychological well-being or mental health of a person.
Personality is simply what makes people act, feel, and think different from one another.
Psychological health, on the other hand, refers to the extent to which an individual is
functioning, feeling, and thinking within the “expected” ranges. Accordingly, while most
measures of mental health are aimed at discriminating between clinical and nonclinical
samples (or between the so called normal and abnormal), personality measures, which are
mostly nonclinical in nature, are descriptive of an individual’s patterns of functioning in a
particular domain of life (e.g., work, nonwork).
There exists empirical evidence concerning personality variables as predictors of
mental health (e.g., Ball, Tennen, Poling, Kranzler, & Rounsaville, 1997; DeNeve & Cooper,
1998; Siegler & Brummett, 2000; Trull & Sher, 1994). Ball et al. reported that “normal”
personality dimensions, such Agreeableness, Conscientiousness, and Extraversion contributed
significantly to the prediction of psychopathology. For example, they found that
Agreeableness and Conscientiousness contributed significantly to the prediction of antisocial
and borderline personality disorders. Furthermore, as predicted, individuals with schizoid and
avoidant personality disorders were lower in Extraversion. Hence, based on the available
empirical and theoretical evidence we state that personality and psychological health,
although related, are conceptually distinct, and that one should be able to predict one from the
other. Moreover, we believe that prediction of psychological well-being from job-related
personality characteristics has important practical implications, reduced cost of selection
being an important one, for the selection of military personnel.
Majority of the studies examining the relationship between personality and mental
health have looked at the relationship between Axis I or Axis II disorders of the American
Psychiatric Association’s (1994) Diagnostic and Statistical Manual for Mental Disorders, 4th
edition, and the five-factor dimensions. It is believed that the power of personality attributes
in predicting psychological health could further be improved when job/context specific
attributes are employed.
Thus, the purpose of the present study was to examine the nature of the relationships
between personality variables and mental health within work context. More specifically, the
study was carried out to examine the predictive ability of job-specific personality variables
concerning psychological well-being for NCOs. Consequently, two inventories, an 11-
dimension measure of NCO personality and a 6-dimension measure of psychological well-
being were developed, and the hypothesized relationship was tested on a relatively large
sample of NCOs in the TAF. It is important to note that the term personality is not used
rigidly in this study; some skill-based individual differences variables were also included
under the same term.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
52

METHOD
Participants
In the current study, 1500 NCOs received a questionnaire package, which comprised
the scales used in the study, through the internal mailing system of the TAF. Of the 1500
NCOs receiving the package, 1428 (95.2 percent) returned the package with the scales
completed. Respondents were employed in the Army (483), Navy (298), Air Force (345), and
Gendarmerie (302), with a mean and standard deviations of 33.11 and 6.85 years of age,
respectively (range = 38). The average tenure of the participants was 13.17 years and the
standard deviation of the tenure was 6.86 years (range = 34.25). All but two participants were
men.
Measures
Noncommissioned Officer Personality Inventory (NCOPI )
A personality inventory, developed by the researchers, was used to evaluate job-
related personality attributes of the NCOs of the TAF. The NCOPI comprises 11 job-related
personality dimensions (i.e., military bearing, determination, dependability, orderliness,
communication, conscientiousness, self-confidence, agreeableness, directing and monitoring,
adaptability, and emotional stability – see Table 1 for dimension descriptions). The
development process of the NCOPI is summarized below. The present study involves the
analyses of the data collected at Step III (i.e., Norm Study) of the NCOPI development.
Step I - Identification and Verification of Critical NCO Attributes. Through 15 focused
group discussions with both noncommissioned and commissioned (COs) officers (N = 152)
and administration of a Critical NCO Behavior Incidents Questionnaire (N = 214), 92 NCO
attributes critical for performance were identified. The identified attributes were then rated by
both NCOs (N = 978) and COs (N = 656) in terms of the extent to which they discriminated
successful and unsuccessful NCOs on a 6-point Likert type scale (1 = Does not discriminate at
all; 6 = Discriminates very much). The analyses yielded 56 attributes relevant for the job of
NCO in the TAF. The 56 attributes were subjected to a further screening by subject matter
experts (i.e., personnel officers from all four forces). The SMEs were asked to evaluate each
attribute in terms importance and cost of not assessing the attribute in the selection of the
NCOs on two 5-point Likert type scales. Forty-six attributes surviving the examination of the
SMEs were then examined by the researchers, and these 46 specific attributes laid down the
framework for item development.
Step II - Item Development and Pilot Study. In the development of the items the relevant
literatures were examined. An initial item pool tapping the dimensions identified at the
previous stage was formed. First, relevant literatures and the International Personality Item
Pool (IPIP, 1999) were examined. Then, the item development was carried out using an
iterative approach. That is, the items developed for a given attribute by individual members
of the research team were brought together and examined in group meetings. In these
meetings, items were either kept, revised, or eliminated from the item pool, and the remaining
items were then reexamined. Resulting item pool (N = 227) were then content analyzed and
the items were further grouped under 28 broader personality variables.
The initial version of the NCOPI was administered to a sample of 483 NCOs
representing different forces. The respondents were asked to indicate the extent to which each
item/statement was True of themselves (1 = Completely false; 4 = Completely true). Item
analyses (reliability analyses and factor analyses) resulted in major revisions in the initial
version of the NCOPI in terms of both item and dimension numbers. Also a few new items
were developed to keep the item numbers across dimensions within an established range. The
resulting version had 166 items under 17 personality dimensions.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
53

Table 1. The NCOPI Dimensions and General Descriptors


NCO Personality Dimensions Descriptors

1. Military Bearing • complying with rules and regulations


• respecting superiors and chain of command
• showing pride in being a military personnel
2. Determination • wanting to be successful
• liking challenge
• having persistence
3. Dependability • being reliable
• being honest
• being reserved and keeping secrets
4. Orderliness • being organized, clean, and tidy
5. Communication • expressing oneself clearly
• being an active listener
6. Self-Discipline • sense of responsibility
• being hardworking
• planning and execution
7. Self-Confidence • self-reliance
• believing in oneself
8. Agreeableness • being able to establish and keep relationships
• working with others in harmony
9. Directing and Monitoring • being able to direct, coordinate, and guide
subordinates
10. Adaptability • being able to adapt to changing work conditions
• stress tolerance
11. Emotional Stability • keeping calm
• not experiencing sudden fluctuations in the mood
states
Step III - Norm Study. The revised version of the NCOPI was administered to a
sample of 1500 NCOs, and of those 1500, 1428 returned the inventory back. The purpose of
this administration was two fold. First was to finalize the inventory and the second was to
establish the norms on the final version of the NCOPI for the population of interest. In
analyzing the norm data, reliability analyses were followed by a series exploratory and
confirmatory factor analyses. These analyses resulted in the final version of the NCOPI,
which comprised 103 items under 11 job-related personality dimensions. The internal
consistency reliabilities for the 11 NCOPI dimensions are presented in Table 2. Norms were
established for the final version. General Mental Health Inventory (GMHI)
A mental health inventory, developed by the researchers as a screening tool to be used
by the TAF, was used to evaluate overall psychological well-being of the respondents. This
inventory was developed in response to a need expressed by the management of the
organization. The GMHI was developed in two steps alongside the NCOPI.
Step I - Item Development and Pilot Study. First, based on the data obtained from
focused group discussions, The Critical NCO Behavior Incidents Questionnaire described
above and the meetings with SMEs (i.e., personnel officers), six dimensions of psychological
well-being were identified: psychotic tendencies, phobic tendencies, psychosomatic

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
54

complaints, hostility, depression, nxiety/obsessive compulsive behaviors. An initial item


pool was formed composed of items related to the dimensions identified. In the development
of the items, relevant literatures and available screening tools were examined. Similar to the
item development of the NCOPI, items of the GMHI were developed using an iterative
approach. The initial version of the GMHI to be piloted consisted of 61 items under the
identified categories.
The GMHI was piloted at the same time with the NCOPI on the same sample (N =
438), using the same scale format. Item analyses resulted in a major revision in the
anxiety/obsessive compulsive tendencies dimension. The items aiming to measure obsessive
compulsive tendencies had lower item-total correlations and consistently they lowered the
internal consistency reliability of the dimension. A decision was made to eliminate these
items and to focus more on generalized anxiety. Hence, new items tapping into generalized
anxiety were developed. The resulting version had 64 items grouping under 6 psychological-
well being dimensions.
Step II - Norm Study. The revised version of the GMHI was administered (along with
the NCOPI) to a sample of 1500 NCOs, and of those 1500, 1428 returned the inventory back.
Again, the purpose of this administration was two fold. First was to finalize the GMHI
inventory and the second was to establish the norms on the final version of the inventory for
the NCOs in the TAF.
In analyzing the GMHI norm data, reliability analyses were followed by a series
exploratory and confirmatory factor analyses. These analyses resulted in the final version of
the GMHI, which is composed of 55 items under the 6 mental health dimensions. The
internal consistency reliabilities for these dimensions are presented in Table 2. Norms were
established for the final version.1

Demographic Information Questionnaire


The questionnaire package sent to the participants included a demographic information
questionnaire. This questionnaire consisted of questions on gender, age, tenure, specialty
area, force, rank, posting, and like.
Procedure
The questionnaire package was sent to an approximately representative sample of the
NCOs employed in the Army, Navy, Air Force, and Gendarmerie of the TAF through internal
mail system with a cover letter from the Chief of Command. The respondents were asked to
fill out the forms and return it back again via the same system. The relatively high response
rate obtained in the current study was thought to partially be a result of the cover letter
accompanying the package, which encouraged participants to respond.
RESULTS
Correlations and reliabilities for the variables are presented in Table 2. As can be seen
from the table, correlations between the variables of the study were all significant, ranging
from -.15 to .81. Reliabilities of both the job-specific personality dimensions and
psychological health dimension were above .70.

1
For a more detailed account of the development process of both the NCOPI and the GMHI please contact the
corresponding author.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
55

Table 2. Correlations Between Personality and Mental Health Dimensions and The
Reliabilities
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1. Depression .85
2. Phobic Tendencies .65 .74
3. Hostility .58 .47 .80
4. Psychotic Tendencies .70 .59 .54 .75
5. Psychosomatic Complaints .62 .52 .50 .57 .83
6. Anxiety .81 .68 .63 .73 .70 .88
7. Military Bearing -.52 -.45 -.47 -.44 -.29 -.44 .80
8. Determination -.54 -.46 -.34 -.33 -.33 -.43 .45 .73
9. Dependability -.43 -.36 -.40 -.47 -.22 -.37 .57 .35 .71
10. Orderliness -.29 -.17 -.19 -.27 -.15 -.22 .30 .33 .34 .82
11. Communication -.64 -.59 -.52 -.55 -.48 -.63 .43 .51 .35 .27 .78
12. Self-Discipline -.66 -.52 -.44 -.53 -.44 -.57 .48 .64 .49 .51 .60 .81
13. Agreeableness -.57 -.52 -.61 -.52 -.38 -.54 .53 .39 .40 .22 .55 .47 .70
14. Directing and Monitoring -.40 -.36 -.15 -.24 -.21 -.31 .21 .52 .15 .24 .49 .49 .27 .72
15. Adaptability -.61 -.61 -.59 -.50 -.49 -.64 .59 .46 .37 .19 .58 .48 .58 .31 .75
16. Self-Confidence -.77 -.59 -.47 -.59 -.52 -.69 .43 .55 .39 .28 .61 .65 .47 .45 .53 .81
17. Emotional Stability -.81 -.64 -.68 -.68 -.61 -.81 .54 .49 .45 .27 .64 .61 .64 .35 .72 .67 .83
18. Mental Health .88 .77 .78 .82 .79 .92 -.53 -.49 -.46 -.26 -.69 -.63 -.64 -.33 -.70 -.73 -.86 .95
Note 1. All of the correlations are significant at .01. Note 2. Reliabilities are presented at the diagonal.
Since both personality and mental health items were presented within the same
instrument using the same format to the same source of data collection (i.e., NCOs), there was
a possibility that the observed correlations could be an artifact of the common method
employed. Hence, before testing the relationship between the personality and mental health
dimensions, a series of confirmatory factor analyses using LISREL 8.30 (Jöreskog & Sörbom,
1996) was performed to test the possibility of common method variance. In these analyses,
first a confirmatory model in which all indicators (personality and mental health-related
dimensions all together) clustering under a single latent variable was tested. This model was
then compared against two alternative models. The first alternative, against which the single-
factor-model was evaluated, had two latent constructs, one for job-related personality
variables, the other for mental health variables (i.e., Personality and Mental Health). The
second alternative model suggested three latent constructs, one for mental health variables,
two for personality variables (i.e., Mental Health, Military Demeanor, Military Efficacy). The
two latent personality constructs in this model were identified through exploratory processes.
The Military Demeanor latent variable included adaptability, emotional stability, military
bearing, dependability, and agreeableness, whereas Military Efficacy included determination,
self-discipline, orderliness, communication, self-confidence, directing and monitoring.
An examination of the modification indices suggested that errors between several
indicator pairs be correlated. Majority of these pairs were conceptually related. Hence a
decision was made to let the errors between dependability and military bearing, dependability
and self-discipline, orderliness and self-discipline, determination, and directing and
monitoring, adaptability and military bearing, emotional stability and military bearing free.
The single factor model was then compared against the two alternative models. Results
suggested that the two-factor and three-factor alternatives had a better fit than the single-factor
model, [χ2change (1, N = 1428) = 144.65, p < .001)] and [χ2change (3, N = 1428) = 675.86, p <
.001)], respectively, decreasing the possibility of existence of common method variance.
Furthermore, the results suggested that the alternative model, with two latent constructs for

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
56

personality variables (i.e., three-factor model) had a relatively better fit than the model with
only one latent variable embracing personality variables (i.e., two-factor model), [χ2change (2, N
= 1428) = 531.21, p < .001)]. Hence a decision was made two conceptualize the 11 NCO
personality variables measured by the NCOPI as being grouped under two latent personality
constructs in the following analyses. The correlations between Mental Health-Military
Demeanor, Mental Health-Military Efficacy, and Military Demeanor-Military Efficacy latent
construct pairs were, .-94, .-.88, and .85, respectively. Figure 1 depicts the three-factor
measurement model.
.63 Military Bearing
.25 .61
.74 Dependability .50 Military
.12 Demeanor
-.02 .51 .70
Agreeableness
.77
.41 Adaptability
.94
.08
Emotional Stability
.12

.55 Determination .85


.67
.36 Self-Discipline
.21 .79
.87 Orderliness
.35 Military
.16 -.94
Efficacy
Communication .78
.40
.83

.31 Self-Confidence
.54

.71 Directing and -.88


Monitoring

.19 Depression
.90
.45 Phobic Tendencies
.74
.52 Hostility .70
Mental
.9
.90 Health
.19 Anxiety
.78
.39 Psychotic
Tendencies
.71
.50 Psychosomatic
Complaints

Figure 1. Three-factor Measurement Model

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
57

To examine the relationship between job-related personality variables and mental


health a model was tested in which two personality constructs with multiple indicators
predicted the mental health construct with multiple indicators (See Figure 2) using LISREL
8.30 (Jöreskog & Sörbom, 1996). The results indicated that the two personality constructs
explained a significant portion of the variance in the mental health factor, R2 = .91, and the fit
of the model was satisfactory (χ2 = 1774.06, df = 110, p < .00, GFI = .87, AGFI = .82, NFI =
.90, NNFI = .89, RMSEA = .10).

Military Bearing

Dependability

Military
Agreeableness Demeanor
Depression
Adaptability
Phobic
.09 Tendencies
-.70
Emotional
Stability
Hostility
Mental
.85 Health
Determination Anxiety

Self-Discipline -.29 Psychotic


Tendencies
Orderliness
Military
Psycho-
Efficacy somatic
Communication Complaints

Self-Confidence

Directing and
Monitoring

Figure 2. Personality Constructs As Predictive of Mental Health


A stepwise regression analysis was performed to see the contribution of the individual
personality variables to the prediction of overall mental health score, which was constructed
by averaging the scores on the six mental health dimensions. Results indicated that except
Military Bearing, Dependability, and Orderliness all personality variables contributed
significantly to the variance explained in the mental health factor. The R2 at the final step was
found to be .81. Emotional Stability had the highest contribution in explaining the variance in
mental health. Table 3 displays β, t, and standard error values resulting from the stepwise
analyses.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
58

Table 3. Stepwise Regression Results for the Prediction of Mental Health


Predictors β t Standard
Error
Emotional Stability -.48 -23.13 .02
Self-Confidence -.23 -12.92 .02
Communication -.15 -8.47 .02
Adaptability -.11 -6.13 .02
Directing and Monitoring -.09 6.21 .01
Agreeableness -.08 -5.13 .02
Self-Discipline -.08 -4.62 .02
Determination -.04 2.49 .02

DISCUSSION
The main purpose of the study was to explore the nature of the relationships between
job-specific personality attributes and psychological well-being for the NCOs in the TAF. A
model in which the two job-specific personality constructs, Military Demeanor and Military
Efficacy, predicting mental health was found to be satisfactory. More specifically, the two
personality constructs explained a great amount of variance in mental health for the NCOs.
Furthermore, among the two latent personality constructs, Military Demeanor had a stronger
association with mental health. Analyses on the individual effects of personality dimensions
suggested that except for military bearing, dependability, orderliness, the NCOPI dimensions
contributed significantly to the prediction of the mental health composite. In short, consistent
with the existing literature, the results provided support for the power of personality attributes
(both in the form of latent traits and as individual dimensions) in predicting mental health
(e.g., Ball, Tennen, Poling, Kranzler, & Rounsaville, 1997; DeNeve & Cooper, 1998; Siegler
& Brummett, 2000; Trull & Sher, 1994).
Results of the present study yielded some support for the argument that there is a need
to distinguish between personality variables and mental health variables, and that common
method by itself cannot be responsible for the observed effects. Furthermore, results
suggested that the NCOPI, which has been developed as a selection tool for the NCOs in the
TAF, could also serve the purpose of screening for mental health. The exceptionally large
variance explained in mental health by both latent and individual personality factors suggested
that the more fitting the personality profile of a candidate, the more he/she is likely to be
mentally fit for the job. As discussed early in the paper, military context calls for individuals
with not only physical but also psychological stamina. This is why mental health has been
among the individual differences factors considered in the selection/screening of military
personnel (e.g., Holden & Scholtz, 2002; Magruder, 2000). Results of the present study imply
that when job-relevant personality attributes are used in the selection process, mental health
assessment may be dispensable, resulting in significant cost savings.
On the other hand, one could still argue that the strong structural correlations between
the latent variables as well as individual dimensions may have resulted from a possible
conceptual overlap among these constructs. The issue of construct overlap indeed deals with
what is actually measured by different constructs/dimensions. Some of the dimensions
included under the NCOPI have either direct or indirect conceptual links to the dimensions of
psychological well-being. For instance, although emotional stability and self-confidence are
treated and measured as independent dimensions of personality, they are also natural

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
59

correlates of psychological well-being, specifically depression (Baumeister, 1993). However,


despite the possibility of conceptual overlaps among personality and mental health constructs,
the three-factor model (i.e., two personality and one mental health factor model) was
significantly better fitting than the alternative single- and two-factor models, suggesting that
the measured constructs were relatively independent of each other.
Future studies are needed to establish both criterion-related and construct validity of
the NCOPI. There exists preliminary evidence concerning the criterion-related validity of the
NCOPI. That is, in a recent study present authors examined the extent to which the NCOPI
dimensions predicted a set of performance criteria (i.e., the ranking of the NCOs in terms of
cumulative performance ratings over their tenure and the number of commendations and
reprimands that they had received over their military career). Some of the NCOPI dimensions
(e.g., directing and monitoring, self-discipline, and self-confidence) were found to contribute
significantly to the variance in both cumulative performance rankings and the number of
commendations received, providing some evidence concerning the predictive validity of the
NCOPI. Yet, it needs to be noted that the criteria of job performance used were not
direct/complete indexes of current performance. Hence, studies using more direct and
comprehensive indices of job performance are needed in establishing the criterion-related
validity of the NCOPI. Concerning the construct validity, correlations of the NCOPI
dimensions with the measures of the same dimensions from different sources (i.e.,
inventories) should be examined.

REFRENCES
American Psychiatric Association. (2000). Diagnostic and statistical manual of mental
disorders (4th ed., Rev.). Washington, DC: Author.
Ball, S. A., Poling, J. C., Tennen, H., Kranzler, H. R., Rounsville, B. J. (1997).
Personality, temperament, and character dimensions and the DSM-IV personality disorders in
substance abusers. Journal of Abnormal Psychology, 106, 545-553.
Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job
performance: A meta-analysis. Personnel Psychology, 44, 1-26.
Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality and performance at
the beginning of the new millennium: What do we know and where do we go next?
International Journal of Selection & Assessment, 9, 9-22.
Bartram, D. (1995). The predictive validity of the EPI and 16PF for military training.
Journal of Occupational & Organizational Psychology, 68, 219.
Baumeister, R. F. (1993). (Ed.) Self-esteem: The puzzle of low self-regard. New York:
Plenum Press.
Borman, W. C., Hanson, M. A., & Hedge, J. W. (1997). Personnel selection. Annual
Review of Psychology, 48, 299-337.
Cigrang, J. A., Todd, S. L., Carbone, E. G. (2000). Stress management training for
military trainees returned to duty after a mental health evaluation: Effect on graduation rates.
Journal of Occupational Health Psychology, 5, 48-55.
DeNeve, K. M., & Cooper, H. (1998). The happy personality: A meta-analysis of 137
personality traits and subjective well-being. Psychological Bulletin, 124, 197-229.
Goldberg L. R. (1990). An alternative “description of personality”: The big five factor
structure. Journal of Personality and Social Psychology, 59, 1216-1229.
Hogan, R., Hogan, J., & Roberts, B. W. (1996). Personality measurement and
employment decisions. American Psychologist, 51 (5), 469-477.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
60

Holden, R. R. & Scholtz, D. (2002). The Holden Psychological Screening Inventory in


the prediction of Canadian Forces basic training outcome. Canadian Journal of Behavioral
Science, 34, 104-110.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990).
Criterion-related validities of personality constructs and the effect of response distortion on
those validities. Journal of Applied Psychology, 75 (5), 581-595.
IPIP. (31.08.1999). The 1,412 IPIP items in alphabetical order with their means and
standard deviations. http://ipip.ori.org/ipip/1412.htm.
Jöreskog, K., & Sörbom, D. (1996). LISREL 8: User’s reference guide. Chicago, IL:
Scientific Software International.
Krueger, G. P. (2001). Military psychology: United States. International Encyclopedia
of the Social & Behavioral Sciences.
Magruder, C. D. (2000). Psychometric properties of Holden Psychological Screening
Inventory in the US military. Military Psychology, 12, 267-271.
McCloy, R. A., Campbell, J. P., & Cudeck, R. (1994). A confirmatory test of a model
of performance determinants. Journal of Applied Psychology, 79, 493-505.
McCrae, R. R. & Costa, P. T. (1989). More reasons to adopt the five-factor model.
American Psychologist, 451-452.
Motowidlo, S. J., & Van Scotter, J. R. (1994). Evidence that task performance should
be distinguished from contextual performance. Journal of Applied Psychology, 79, 475-480.
Mount, M. K., & Barrick, M. R. (1995). The Big Five personality dimensions:
Implications for research and practice in human resource management. Research in
Personnel and Human Resource Management, 13, 153-200.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis
of integrity test validities: Findings and implications for personnel selection and theories of
job performance. Journal of Applied Psychology, 78, 679-703.
Russell, M. & Marrero, J. M. (2000). Personality styles of effective soldiers. Military
Review, 80, 69-74.
Salgado, J. F. (1997). The five factor model of personality and job performance in the
European Community. Journal of Applied Psychology, 82, 30-43.
Sandal, G. M., Endresen, I. M., Vaernes, R., V., & Ursin, H. (1999). Personality and
coping strategies during submarine missions. Military Psychology, 11, 381.
Schneider, B. (1987). The people make the place. Personnel Psychology, 40, 437-453.
Schneider, B., Goldstein, H. W., & Smith, D. B. (1995). The ASA framework: An
update. Personnel Psychology, 48(4), 747-773.
Siegler, I. C. & Brummett, B. H. (2000). Associations among NEO Personality
Assessments and well-being at mid-life: Facet-level analysis. Psychology and Aging, 15, 710-
714.
Stevens, G., Hemstreet, A., Gardner, S. (1989). Fit to lead: Prediction of success in a
military academy through the use of personality profile. Psychological Reports, 64, 227-235.
Stricker, L. J. & Rock, D. A. (1998). Assessing leadership potential with a
biographical measure of personal traits. International Journal of Selection and Assessment, 6,
164.
Sumer, H. C., Sumer, N., Cifci, O. S. (November 7-9, 2000). Establishing construct
and criterion-related validity of a personality inventory in the Turkish Armed Forces. Paper
presented at the International Military Testing Association Annual Conference, Edinburgh,
UK.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
61

Thomas, J. L., Dickson, M. W., & Bliese, P. D. (2001). Values predicting leader
performance in the U.S. Army Reserve Officer Training Corpse Assessment Center: Evidence
for a personality-mediated model. The leadership Quarterly, 12, 181-196.
Trull, T. J. & Sher, K. J. (1994). Relationship between the five-factor model of
personality and Axis-I disorders in a nonclinical sample. Journal of Abnormal Psychology,
103, 350-360.
Van Scotter, J. R., & Motowidlo, S. J. (1996). Interpersonal facilitation and job
dedication as separate facets of contextual performance. Journal of Applied Psychology, 81,
525-531.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
62

JOIN: Job and Occupational Interest in the Navy


William L. Farmer, Ph.D., Ronald M. Bearden, M.S., PNC Edward D. Eller,
Paul G. Michael, Ph.D., LCDR Roni S. Johnson, Hubert Chen, B.A., Aditi Nayak, M.A.,
Regina L. Hindelang, M.S., Kimberly Whittam, Ph.D., Stephen E. Watson, Ph.D.,
and David L. Alderton, Ph.D.
Navy Personnel Research, Studies, and Technology Department (PERS-1)
Navy Personnel Command
5720 Integrity Drive, Millington, TN, USA 38055-1000
William.L.Farmer@navy.mil

Navy researchers, along with other contributors have recently developed new
classification decision support software, the Rating Identification Engine (RIDE). The main goal
behind the new system is to improve the recruit-rating job assignment process so that it provides
greater utility in the operational classification system. While RIDE employs many tactics to
improve the assignment procedure, one strategy is to develop a measure of Navy-specific
interests that may be used in conjunction with the current classification algorithm during
accessioning. Job and Occupational Interest in the Navy (JOIN) is a computer-administered
instrument intended to inform applicants about activities and work environments in the Navy,
and measure the applicant’s interest in these activities and environments. It is expected that the
simultaneous utilization of the JOIN interest measure and the RIDE ability components will
improve the match between the Navy recruit’s abilities and interest, and ultimately serve as a
means of increasing job satisfaction, performance, and retention.
The recruit typically has some degree of uncertainty when faced with the wide array of
opportunities available from among more than 70 entry-level jobs (in the Navy, ratings), and
over 200 program-rating combinations. The Navy, in deciding which rating is best suited for a
recruit, should strike a careful balance between filling vacancies with the most qualified
applicants and satisfying the applicants’ career preferences. Much is at stake in the process and
research in civilian and military organizations has produced several pertinent findings. First, a
lack of qualifications has been shown to lead to training failures and degraded job performance.
Additionally, people who occupy jobs that are inconsistent with their interests are less likely to
be satisfied with their work and are more prone to leave the organization for other job
opportunities. Finally, dissatisfied employees have higher absenteeism on the job, engage in

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
63

more counterproductive behaviors, and seek alternative employment more often than their
satisfied counterparts (Lightfoot, Alley, Schultz, Heggestad, Watson, Crowson, & Fedak, in
press; Lightfoot, McBride, Heggestad, Alley, Harman, & Rounds, in press).
The JOIN vocational interest system will provide a critical component to the RIDE
classification process. Current interest inputs to RIDE represent informal discussions with the
classifier, which vary quantitatively and qualitatively by applicant. The JOIN system educates
individuals about the variety of job related opportunities in the Navy, and creates a unique
interest profile for the individual. The Sailor-rating interest fit for all Navy Ratings is identified
by comparing the Applicant’s Rating Interest Profile to each of the Rating Interest Profiles
generated by JOIN. Once validated, JOIN provides a standardized and quantified measure of
applicant vocational interests, which will be provided as an input to RIDE. If successful,
RIDE/JOIN can be implemented for initial classification, and transitioned to training and fleet
commands for re-classification. Recent research efforts have focused on the development of the
comprehensive JOIN Rating Interest Profile model for all Navy ratings, based on a series of
analyses including iterative Subject Matter Expert (SME) interviews. Paralleling these efforts has
been the development of the JOIN experimental software, also developed in concert with SMEs
(see section below for details).

JOIN Model Development

Following an early effort by Alley, Crowson, and Fedak (in press), that was very much in
the vein of typical contemporary interest inventories, it was determined that JOIN’s format
should be more pragmatically based. The development of the current JOIN tool is documented
in Michael, Hindelang, Watson, Farmer, and Alderton (in press). The item development for Jobs
and Occupational Interests in the Navy (JOIN) was an iterative process. The first challenge was
to develop work activity and work environment items through an abbreviated job analytic
procedure. A basic model of work served as the framework for the examination of Navy jobs and
for the development of the inventory items. Conceptually, at the macro-level, the Navy consists
of various job families or groupings of jobs according to organizational function, platform and/or

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
64

work process (e.g., administration, health care, submarine personnel, etc.). Examining the world
of Navy work on a micro-level reveals work activities or tasks that describe the work that is
performed.
The first step in the item development process involved the collection of all of the
available job descriptions from the Navy Enlisted Community Manager’s (ECM) web site. A
researcher reviewed each of these job descriptions, and highlighted words that reflected the
following categories: 1) job families, or Navy community areas (i.e., aviation, construction,
submarine, etc.); 2) work activity dimensions, a process (verbs) and/or content (nouns) words;
and 3) work context dimensions, or the work environment (i.e., working indoor, working with a
team, etc.). From these highlighted extracts, lists of communities, processes, content words, and
work environments, which seemed most representative of each Navy rating (N=79) were created.
The process and content words were joined in various combinations to form process-content
(PC) pairs. These PC pairs serve as individual interest items, allowing participants to indicate
their level of interest in the work activity (e.g., maintain-mechanical equipment). Currently, a
total of 26 PC pairs are included in JOIN.
After developing the content for the interest inventory, the next phase of the project was
to design and create a computer-administered measure of interests. The current version of the
interest inventory (JOIN 1.01e) assesses three broad dimensions of work associated with Navy
jobs and integrates over three hundred pictures of personnel performing job tasks to enhance the
informational value of the tool. The first dimension, Navy community area, contains seven Navy
community areas (e.g., aviation, surface, construction, etc.). Participants are asked to rank the
individual areas, represented by four pictures each with its own text description, based on level
of interest (i.e., most interesting to least interest, and not interested). The second dimension
contains eight items describing work environments or work styles (e.g., work outdoor, work with
a team, etc.). Participants are asked to indicate their level of preference for working in certain
contextual conditions. Again, pictures with text descriptions represent each item. The final
dimension, work activity, includes twenty-six PC pairs. Each PC pair serves as an individual
interest item that allows participants to indicate their level of interest in the work activity

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
65

dimension (e.g., maintain mechanical equipment, direct aircraft, etc.). Three pictures (in the
initial release), and descriptive text, represent each PC pair item

JOIN Software Testing

Usability Testing I. The first test phase occurred during August of 2002 at the Recruit
Training Center (RTC) Great Lakes, and was conducted with a sample of 300 new recruits.
Participants were presented with JOIN and its displays, images, question content, and other
general presentation features in order to determine general test performance, item reliability,
clarity of instructions and intent, and appropriateness with a new recruit population for overt
interpretability, required test time, and software performance. The initial results from the
usability testing were very promising on several levels. First, the feedback from participants
provided researchers with an overall positive evaluation of the quality of the computer
administered interest inventory. Second, the descriptive statistical analyses of the JOIN items
indicated that there was adequate variance across individual responses. In other words, the
participants were different in their level of interest in various items. Finally, the statistical
reliability of the work activity items was assessed and the developed items were very consistent
in measuring participant interest in the individual enlisted rating job tasks. The results from this
initial data collection effort were used to improve the instrument prior to subsequent usability
and validity testing (Michael, Hindelang, Watson, Farmer, & Alderton, in press).

Instrument Refinement. Based on the results of the initial usability study, a number of
changes were made. These changes were made with three criteria in mind. First, we wanted to
improve the interface from the perspective of the test taker. Second, it was imperative that
testing time be shortened. Though this modification does contribute to the “user-friendliness” of
the tool, the initial impetus for this was the very real operational constraint, as directed by
representatives from the Navy Recruiting Command (CNRC), that the instrument take no more
than ten to fifteen minutes to complete. Finally, if at all possible, it was necessary that the
technical/psychometric properties of the instrument be maintained, if not enhanced.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
66

Though the initial usability testing was favorable overall, one concern was voiced on a
fairly consistent basis. Respondents stated that there was an apparent redundancy in the items
that were presented. This redundancy was most often characterized, as “It seems like I keep
seeing the same items one right after another.”
One explicit feature that was a target during the initial development was that a set of
generalizable process and content statements would be used. For instance, the process
“Maintain” is utilized in nine different PC pair combinations and a number of the content areas
are used in as many a three PC pairs. Due to targeted design it was decided that this feature
would not be revised.
Also contributing to the apparent redundancy was the fact that three versions of each PC
item were presented, yielding a total of 72 PC items being administered to each respondent. This
feature had been established as a way of ensuring psychometric reliability. With a keen eye
toward maintaining technical standards, the number of items was cut by one-third, yielding a
total of 56 PC items in the next iteration of the JOIN tool.
Finally, the original algorithm had specified that all items be presented randomly.
Though the likelihood of getting the alternate versions of a PC pair item one after the other was
low, we decided to place a “blocking constraint” in the algorithm; whereas an individual receives
blocks of one version of all of the 26 PC pairs presented randomly. With the number of PC pair
presentations being constrained to two, each participant receives two blocks of 26 items.
As users had been pleased with the other major features of the interface, no refinements
were made other than those mentioned. Reduction in testing time was assumed based on the
deletion of PC item pairs. Decisions to delete items were made using a combination of rational
and technical/psychometric criteria. As stated earlier, initial item statistics had been favorable in
that internal consistencies within 3-item PC scales were good (mean α = 0.90), and sufficient
variation across scale endorsement indicated that individuals were actually making differential
preference judgments. Items were deleted if they contributed little (in comparison to other items
in the scale) to PC scale internal consistency or possessed response distributions that were
markedly different from alternate versions of the same item. In lieu of definitive empirical
information, items were also deleted if they appeared to present redundant visual information (as

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
67

judged by trained raters). The resulting 2-item PC scales demonstrated good internal consistency
(mean α = 0.88). Additional modifications were made that enhanced item data interpretation and
allowed for the future measurement of item response time.

Usability Testing II. The second phase of testing occurred over a three and a half month
period in the spring of 2003 at RTC Great Lakes. A group of approximately 4,500 participants
completed the refined JOIN (1.0e) instrument. The group was 82% male, 65% white, 20%
black, and 15% other. From a usability perspective, 93.2% of all respondents rated JOIN “good”
or “very good.” Regarding the P-C descriptors, 90.4% of respondents felt that the pictures did a
“good” or “very good” job of conveying the information presented in the descriptors, and 80.5%
stated that the items did a “good” or “very good” job of conveying Navy relevant job information
to new recruits. In terms of psychometric quality, the average P-C scale α was 0.87. Descriptive
statistics indicate that participants have provided differential responses across and within work
activity scales. The average testing time decreased (from the original version) from an average
of 24 minutes to 13 minutes. The average time spent per item ranges from 8 to 10 seconds
(except for special operations items – 21 seconds). Special programs and Aviation are preferred
communities, with working outside and in a team as the work environment and style of choice.
As in the initial pilot test, the most desirable work activity has been to operate weapons.

Criterion Related Validity Testing. Currently the data collected in the most recent round
of testing is also being used to establish criterion-related validity of the JOIN instrument. As
those who completed the instrument lack prior experience or knowledge of the Navy or Navy
ratings, they are an ideal group to use for establishing predictive validity of the tool. Criterion
measures (e.g. A-school success) will be collected as participants progress through technical
training, and those data become available. Participants’ social-security-numbers (SSN) were
collected to link interest measures to longitudinal data, including the multiple survey 1st Watch
source data. Additional measures will also include attrition prior to End of Active Obligated
Service (EAOS), measures of satisfaction (on the job and in the Navy), propensity to leave the
Navy, or desire to re-enlist. Additionally, JOIN results will be linked with performance criteria.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
68

JOIN Model Enhancement

In addition to the establishment of criterion-related validity, current efforts are focused on


establishing and enhancing the construct validity of the SME model upon which the JOIN
framework exists. As mentioned previously, the tool was developed using the judgments of
enlisted community managers. In addition to decomposing rating descriptions to core elements
and matching photographs with these elements, this group also established the initial scoring
weights, which were limited in the first iteration to unit weights. At present, NPRST researchers
are conducting focus group efforts with Navy detailers, classifiers, and A-school instructors for
the purpose of deriving SME rater determined numerical weights that establish an empirical link
between JOIN components and all existing Navy rates that are available to first term sailors.
These weights will be utilized in the enhancement of the scoring algorithm that provides an
individual preference score for each Navy rating. A rank ordering (based on preference scores)
of all Navy ratings is provided for each potential recruit.

Future Directions

Plans include linking JOIN results with other measures that include the Enlisted Navy
Computer Adaptive Personality Scales (ENCAPS) and other individual difference measures
currently being developed at NPRST. The establishment of a measurable relationship between
job preference and such constructs as individual temperament, social intelligence, teamwork
ability, and complex cognitive functioning will greatly advance the Navy’s efforts to select and
classify sailors and ensure the quality of the Fleet into the future.

References

Alley, W.E., Crowson, J.J., & Fedak, G.E. (in press). JOIN item content and syntax templates
(NPRST-TN-03). Millington, TN: Navy Personnel Research, Studies, & Technology.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
69

Lightfoot, M.A., Alley, W.E., Schultz, S.R., Heggestad, E.D., Watson, S.E., Crowson, J.J., &
Fedak, G.E. (in press). The development of a Navy-job specific vocational interest
model (NPRST-TN-03). Millington, TN: Navy Personnel Research, Studies, &
Technology.

Lightfoot, M.A., McBride, J.R., Heggestad, E.D., Alley, W.E., Harmon, L.W., & Rounds, J.
(in press). Navy interest inventory: Approach development (NPRST-TN-03).
Millington, TN: Navy Personnel Research, Studies, & Technology.

Michael, P.G., Hindelang, R.L., Watson, S.E., Farmer, W.L., & Alderton, D.L. (in press). JOIN:
Interest inventory development and pilot testing. I (NPRST-TN-03). Millington, TN:
Navy Personnel Research, Studies, & Technology.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
70

OZ: A Human-Centered Computing Cockpit Display

Leonard A. Temme, Ph.D.


Naval Aerospace Medical Research Laboratory
U. S. Naval Air Station - Pensacola

David L. Still, O.D., Ph.D.


John Kolen, Ph.D.
Institute of Human and Machine Cognition
University of West Florida

LCDR Michael Acromite, MC USN


Training Air Wing SIX
dstill@ai.uwf.edu

Abstract
An aviator may control an aircraft by viewing the world through the windscreen (visual
flight) or with information from the cockpit instruments (instrument flight), depending upon
visibility, meteorological conditions, and the competence of the pilot. It seems intuitively
obvious that instrument flight should be far more challenging than visual flight. However, since
the same pilot controls the same aircraft through the same air using the same stick-and-rudder
input devices, the only difference is the way the same information is presented. Consequently,
instrument flight is harder than visual flight only because of the way the instruments display the
information. Each instrument displays one flight parameter and it is up to the pilot to convert the
numeric value of the displayed parameter into useful information. We think that it is possible to
use modern display technologies and computational capabilities to make instrument flight safer,
easier, and more efficient, possibly even beyond that of visual flight. The challenge is design of
the instruments. Using emerging principles of knowledge engineering derived from such areas
as human-centered computing, cognitive task analysis, and ecological display design, we are
designing a suite of aircraft cockpit instruments to enhance instrument flight performance. Our
presentation will describe our approach, methodology, instrumentation, and some experimental
results. We have used commercially available, off-the-shelf desk-top flight simulators for which
we have developed precise flight performance measures that may have broad application for
training and performance testing.

Introduction
In their 1932 book, Blind Flight, Ocker and Crane (26) delineated several requirements
for an ideal cockpit instrument display. It should: (1) be a single instrument, (2) resemble
closely the natural environment, (3) show the direction and degree of turn in a normal manner,
(4) show angle of bank in a normal manner, (5) show climb and glide by the position of the
“nose” of the aircraft, (6) eliminate vertigo, (7) prevent fatigue, (8) require little special training
to use, (9) prompt the pilot to use the controls in a normal reflex manner, (10) not be affected by

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
71

abnormal positions such as spins and acrobatics, and (11) have dependability and incorporate
factors of safety so that should one feature fail, others would suffice for the time.
Ocker and Crane’s list can be extended to: (1) enable pilots to fly as aggressively and as
effectively on instruments as in visual flight conditions, (2) enable instrument group tactics of
manned or unmanned air combat vehicles by allowing each pilot/operator to visualize the others,
(3) adapt to changes in aircraft configuration and environment in order to show real-time aircraft
capability (This is particularly important for tilt-rotor, vectored thrust, or variable geometry craft
with their correspondingly complex flight envelopes.), (4) facilitate instrument hover with
motion parallax cues showing translation, (5) combat spatial disorientation and loss of situational
awareness by provide a visually compelling 360° frame of reference as a true six degrees-of-
freedom, directly-perceivable link to the outside world, (6) allow pilots to fly aircraft without
deficit while simultaneously studying a second display, (i.e. radar, FLIR, or map), (7) enable
pilots to control the aircraft even with reduced vision from laser damage or lost glasses, and (8)
be realized with a software change in any aircraft with a glass cockpit.
Conventional flight instrument displays clearly fail to meet these requirements for several
reasons. For example, the pilot must scan the instruments, looking at or near each of a number
of instruments in succession to obtain information. Studies of pilot instrument scanning have
shown that it is not unusual for even trained pilots to spend as much as 0.5 sec viewing a single
instrument and durations of two seconds or more are to be expected even from expert pilots in
routine maneuvers (10, 11, 32). Consequently, the time required to sequentially gather
information can be substantial, severely limiting the pilot’s ability to cope with rapidly changing
or unanticipated situations and emergencies. Furthermore, the pilot must constantly monitor
instruments to ensure that the aircraft is performing as intended because the instruments do not
"grab" the pilot's attention when deviations from prescribed parameters occur.
Another shortcoming is that current flight instrument displays use many different frames
of reference, with information in a variety of units: degrees, knots, feet, rates of change, etc. The
pilot must integrate these different units into a common frame of reference to create an overall
state of situational awareness. Moreover, the basic flight instruments are not integrated with
other cockpit instrumentation such as engine, weather, and radio instruments. The components
of each of these, like the basic flight instruments, have different units and do not share a common
frame of reference. The traditional practical solutions to these problems have been to severely
limit flight procedures, emphasize instrument scan training, and require extensive practice.

The Development of OZ
OZ is a system based on principles of vision science, Human-Centered Computing
(HCC) (12), computer science, and aerodynamics aimed at meeting the requirements of an ideal
cockpit display. OZ, as an example of HCC, is an effective technology that amplifies and
extends the human's perceptual, cognitive, and performance capabilities while at the same time
reducing mental workload. The controlling software (i.e., the calculations ordinarily imposed on
the pilot) runs seamlessly "behind the curtain" but without hiding specific values of important
parameters to which the pilot needs to have access.
Research on vision and cognition suggested ways to eliminate the fundamental speed
barrier of traditional displays, the "instrument scan." The visual field can be divided into two
channels, the focal (closely related to central or foveal vision), and the ambient (closely related

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
72

to, but not identical with peripheral vision) (20, 21, 22, 28, 30, 35). The focal channel is used for
tasks such as reading, which require directed attention. The ambient channel is used primarily
for tasks such as locomotion that can be accomplished without conscious effort or even
awareness. In the normal environment both of these channels are simultaneously active, as when
a running quarterback passes the ball to a receiver or when a driver reads a sign while controlling
an automobile during a turn. Significantly, the design of conventional instruments requires that
the focal channel be directed sequentially to each instrument, producing the ‘instrument scan’
(18, 25) while the part of the visual system that is optimized for processing locomotion
information, the ambient channel, is largely irrelevant for this task.
To harness the power of both focal and ambient channels, and therefore to reduce the
delays imposed by sequential information gathering, OZ display elements are constructed using
visual perceptual primitives⎯luminance discontinuities that are resilient to one and/or two
dimensional optical and neurological demodulation (i.e., dots and lines). The resilience of the
perceptual primitive to demodulation allows them to pass information through both the ambient
and focal channels’ optical and neurological filters (33, 34, 39). OZ organizes these perceptual
primitives into meaningful objects using such visual perceptual phenomena as figure-ground (5,
16, 19, 29), pop-out (9, 17), chunking, texture (2, 5), effortless discrimination (2, 16,28), and
structure-from-motion (13, 23,27). These phenomena organize the graphic primitives into the
objects that constitute OZ symbology, objects that have perceptual meaning and are quickly
understood. Concepts derived from the Human-Centered Computing approach (7, 12) enabled
us to further refine OZ by reducing human information processing requirements. Specifically,
OZ combines and reduces different data streams into proportionately-scaled symbology that the
pilot can immediately apprehend and use. For example, information on aircraft configuration,
density altitude, and location are integrated into a directly perceivable picture of the aircraft’s
present capability.
Finally, to reduce the cognitive workload of instrument flight, OZ uses a common frame
of reference to bring together all cockpit information to create a single, unified display,
producing a picture that can be clearly and quickly understood. The frame of reference provides
the structure that transforms OZ’s separate perceptual objects into an ensemble of meaningfully
interactive components. This is one reason that OZ can communicate spatial orientation, aircraft
location, flight performance, aircraft configuration, and engine status all in the time it takes to
look at a single conventional instrument.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
73

Description of OZ
---------------------------------------------------------

Fig. 1: The star field metaphor


---------------------------------------------------------
OZ is organized into two components, the star field metaphor and aircraft metaphor,
which encode aircraft location and capability respectively. The star field metaphor, Figure 1,
shows the aircraft’s attitude and location in the external world. The star field metaphor is a one-
to-one mapping of the external world onto a coordinate system that displays both translations and
rotations. The star field metaphor in Figure 1 shows horizontal angular displacements linearly
projected along the x-axis and vertical angular displacements tangentially projected along the y-
axis. Several star layers are within the star field. Each star layer is located at a specific altitude.
The forward edges of these altitude layers are 0.5 nautical miles in front of the aircraft and are
composed of dots placed at every 10° of heading. The surface plane of the layers is defined by
star ‘trails’ flowing back around the aircraft from every third dot of the altitude layer’s leading
edge . The flow of these star ‘trails’ shares a common expansion point, located at the center of
the horizon line. This array of star trails creates apparent altitude planes and heading streams.
The horizon line, relative to the star layers, shows the aircraft’s altitude. The center of the
horizon line corresponds to the aircraft’s centerline and location. For example, an aircraft

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
74

altitude of 3,250 feet would place the horizon line midway between the 3,000 and 3,500 foot
layers. When the aircraft's altitude corresponds to a displayed star layer, the horizon line is
located at the layer’s forward edge and that layer’s stars stream along the horizon line. The
location of the lubber line (blue line located above the center of the horizon line and
perpendicular to it) provides heading information. In OZ’s current embodiment, at any given
instant the displayed portion of the coordinate system extends 360° horizontally and 60°
vertically. This angular construction enables the location of external objects such as runways,
radio aids, weather, terrain, and air traffic to be mapped in the same coordinate space used to
show aircraft attitude and flight path.
The graphic elements of the star field metaphor shown in Figure 1 are:
• Horizon Line – Aircraft attitude is shown by the orientation of the horizon line. Aircraft
altitude is indicated by the location of the horizon line within the star field.
• Lubber Line - Aircraft heading is show by the location of the lubber line within the star field.
• Pitch Ladder – The vertical angular displacement scale is provided by the pitch ladder. Cross
marks are 5° apart.
• Star Layers and Streams – Star layers marks every 500 feet in altitude. Each star layer’s
leading edge is constructed with stars placed 10° apart. Star streams originate from every
third star and mark 30° heading increments.
• Highlighted Star Layer – Specific altitudes may be marked with additional star layers. These
layers are constructed with larger stars for clarity.
• Highlighted Star Stream - Specific headings may be marked with additional star streams.
These streams are constructed with larger stars for clarity.
• Runway – The ends and middle of a runway are marked with filled circles connected with
straight lines. Alignment with the runway centerline and the runway’s location are shown.
In addition to these elements, the OZ star field can display the three dimensional location
of waypoints, other aircraft, obstructions, and thunderstorms.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
75

-------------------------------------------------------------

Fig. 2: The aircraft metaphor


--------------------------------------------------------------
The aircraft metaphor depicted in Figure 2 is a stylized triplane composed of lines and
circles. The location of the aircraft metaphor in the star field metaphor shows attitude and flight
path information. The size and interrelationship of the triplane’s parts map the aircraft’s
configuration, airspeed, engine output, and flight envelope. The span-wise location of the struts
connecting the wings is proportionate to airspeed. The further outboard the struts the greater the
airspeed, with the structurally limited airspeed located at the wingtips and the minimum flying
speed located at the wing roots. This span location also provides the x-axis speed scale of the
embedded power required curve. The shape of the upper and lower wings is a stylized graph of
the aircraft’s power requirements with the perpendicular distance between the upper and lower
bent wings indicating the amount of power required for unaccelerated flight at the corresponding
airspeeds. The length of the struts indicates power available. The extent to which the wing
struts are colored green indicates power in use. The struts are scaled so that power equals
demand when the green of the struts reaches the upper and lower wings. With this design, the
wings and struts depict the complex interrelationship between power, drag, lift, airspeed,
configuration, and performance.
Figure 2 shows the components of the aircraft metaphor when the aircraft is at cruising
speed, with power set to less than required for level flight. Digital readouts for some parameters
have been toggled off to simplify the illustrations. The graphic elements shown are:

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
76

• Stick – Turn rate and direction are shown by the stick’s lateral and angular position relative
to the star field.
• Pitch Line – Aircraft attitude is shown by the location of the pitch lines in the star field.
• Speed Rings – Vertical speed magnitude and direction are encoded by the radius and location
of the speed rings in the star field. The outboard speed rings mark a selected airspeed
location along the wingspan.
• Pendulum – Inclinometer and Gz forces are shown by the orientation and size of the
pendulum.
• Speed Struts – Airspeed is shown by the location of the speed strut along the wing span.
Available propulsion is reflected in the length of a strut relative to the separation of the upper
and lower bent wings. Power is correct for the current airspeed, aircraft configuration, and
density altitude when the inner green segment of the strut just touches the upper and lower
bent wings.
• Straight Wing – The airspeed scale is provided by the straight wing. Aircraft wings are level
when the straight wing is parallel with the horizon line and star layers.
• Bent Wings – The aircraft drag curve adjusted for aircraft configuration (weight, landing gear
position, flap setting, etc.) and density altitude effects is displayed in a stylized fashion by the
bent wings. The angle between them and the straight wing corresponds to the bank angle
required for a standard rate turn when in coordinated flight.
• Wing Pinch – Minimum drag speed is marked by the location of the wing pinch along wing
span.
• Wing Tips – Maximum safe structural speed is marked by the location of the wing tips.
• Wing Roots – Minimum flying speed is marked by the location of the wing roots.
For comparison, Figure 3 illustrates the computer screens of OZ (Figure 3a) and
conventional (Figure 3b) displays. These screens were captured within seconds of each other
and therefore show correspondingly similar information, but of course in very different ways.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
77

---------------------------------------------------------

Fig. 3: Screen shot of OZ with the star field and aircraft metaphors combined. The screen shot of
the conventional and OZ displays were separated by a few moments.
---------------------------------------------------------
Examination of Figure 3a reveals: The near end of the runway is 30° to the left of the
aircraft centerline and 9° below the horizon. The aircraft is turning left at a low rate toward the
runway. The current heading is 081°. The highlighted heading is set on the runway heading of
092°. Notice that the extended runway centerline is well to the left of the aircraft. The aircraft
descent angle is 2.5° and is at 2,900 feet. Flaps are retracted and airspeed is 6 knots below that
marked by the Speed Rings. Airspeed is above minimum drag speed and below maximum
structural speed. The power is set less than required for level flight under these conditions. No
obstructions, weather, or other aircraft are visible. The conventional instrument panel was
captured slightly after the OZ screen capture. Altitude has decreased to 2,760 feet, heading
changed to 070°, and rate of turn increased.
Depicting aerodynamic relationships by the size and interaction of structure, as illustrated
by the wings and struts, is a general concept carried throughout OZ. As a consequence of this
design approach, OZ produces an explicit graphic depiction of aircraft performance that the pilot
would otherwise have to construct and maintain as a mental model. This has several benefits as
demonstrated in the experiments. First, it reduces the pilot’s requirements to recall the currently
correct model. Second, it reduces the amount of mental calculation required to apply the model
to current conditions. Third, it can insure that all pilots are using the same model. The overall

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
78

result is that OZ shifts the workload requirements for flight from one of visual scanning of
separate instruments and displays, requiring intensive integration and computation, to nearly
instantaneous or ‘direct’ perception of an integrated picture. This allows a glance at OZ to
convey most of the information contained in an entire panel of conventional instruments that
may take several seconds to scan. Although OZ presents the pilot with processed data, the
processing does not obscure information nor does it make covert, de facto decisions for the
operator.

Experiments
Two experiments are reported. Experiment 1 evaluated the overall OZ approach, the
general scaling of flight parameters to OZ symbology, and the software implementation.
Experiment 2 involved the tasks of simultaneously flying with instruments and reading. We
considered this a particularly compelling demonstration since it is a task that can be clearly
important operationally but is patently impossible with conventional flight instruments, but with
OZ, it’s easy.
Experiment 1: The Demonstration of the OZ Concept
This was the first formal evaluation of OZ; the goal was to ensure that OZ was scaled
such that flight performance with OZ under simple conditions was comparable with that obtained
with conventional instruments.
Method
Participants: Two first year medical-students with no previous pilot experience
volunteered for these studies conducted over a two-month period. Neither participant required a
refractive correction and both were less than twenty five years of age.
Simulator and Equipment: Elite (Prop. Version 4) simulator software running on a
Macintosh computer provided the Cessna 172 aerodynamic model and the conventional
instrument display (CD) used in this and the following study. The manufacturer modified the
commercial software to export flight data. The OZ display was created with custom C++ code
running on a Pentium PC receiving the exported simulator data. Both OZ and the CD were
presented on 19-inch monitors placed adjacent to each other. An opaque screen blocked one or
the other monitor from view depending on which display the participant was flying. The flight
controls were the same for both displays. Aileron, elevator, and engine controls were input with
a Precision Flight Control Console, Jack Birch Edition. Rudder control was disabled.
Task: Participants were to fly straight and level on a course heading due south at 3000
feet at a constant indicated air speed of 100 knots for about three minutes per trial. Participants
could rest between trials. Completing a condition required between one to two hours per
participant.
Independent Variables: In a two factor experiment, two levels of flight display (OZ and
the CD) were compared for the 4 levels of the default turbulence that the Elite Simulator
provided (none, low, moderate, and severe).
Training: The participants had no formal experience with either CD or OZ before the
experiment. The task of flying straight and level was described and illustrated to the participants
with both displays. Participants were given instructions about the instruments, and all questions
were answered about the displays until the participants said they were satisfied that they
understood the task. Data collection started with their first flight.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
79

Procedure: Data collection consisted of the participant flying one display for 3000
simulator cycles, which was about three minutes, then flying the other display for 3000 cycles,
alternating between the displays for from one to two hours. Data collection for both displays
continued for a given turbulence condition until the participant felt that the flight performance
had stabilized on both displays. Whether OZ or CD began the run was determined randomly.
The condition without turbulence was completed first. This condition was followed by the low
turbulence, followed by the moderate turbulence followed by the extreme turbulence. One
turbulence level per day was collected per participant. Several days could intervene between
data collection. The two participants worked as a pair, one serving as the experimenter for the
other.
Data Reduction: Flight performance was scored as root mean square (RMS) error. RMS
error was chosen because it combines two different aspects of performance into a single metric--
the variability of the observed performance relative to a target performance, which itself can be
displaced from the average performance (1, 14, 31). RMS errors for both heading (in degrees)
and altitude (in feet) were calculated from 300 successive simulation cycles, each cycle
providing a heading and an altitude RMS. A trial consisted of 10 heading and altitude RMS
error scores, summarizing the 3000 successive simulation cycles. For a trial, the 10 RMS scores
for heading and 10 RMS scores for altitude were further reduced to a mean altitude RMS error
and mean heading RMS error.
Results
To illustrate how the data were reduced for analysis, Figure 4 plots, as a function of time,
flight performance data of individual trials obtained for a single condition and participant. Panel
(a) shows RMS altitude error in feet and panel (b) shows RMS heading error in degrees under
the condition of light turbulence. Each dot displays RMS error for 300 simulator cycles obtained
with OZ; the line connects comparable RMS errors obtained with CD. The horizontal axis
shows the that data collection duration for this run lasted 66 minutes (33,000 simulator cycles).
Before this run, the participant had about 45 minutes of flight experience obtained during the no
turbulence condition. Data from the other participant were essentially identical to those shown
in the Figure.
----------------------------------------------

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
80

Fig. 4a: Altitude performance of the volunteer (as RMS error in feet) for light turbulence over
the complete run, alternating between OZ and the CD.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
81

Fig. 4b: Heading performance of the volunteer (as RMS error in degrees) for light turbulence
over the complete run, alternating between OZ and the CD.
----------------------------------------------
Figure 4 shows that the flight performance obtained with the two displays is different.
Among the most obvious differences is that flight performance with OZ has smaller RMS values
than that obtained with CD. This was true for altitude as well as heading control. With CD, this
participant frequently lost control of heading (over 135° error) and/or altitude (over 200 feet
error); but never lost control of either with OZ. Furthermore, there was a difference in how
performance changed as the run progressed. Performance with both displays improved at about
the 40 minute mark. OZ’s superior flight performance improved slightly. CD’s relatively poor
starting flight performance improved by over a factor of 8 for altitude and a factor of 4 for
heading by the end of the run nonetheless, performance was still far worse with CD than with
OZ.
To evaluate the effect of display, turbulence, and experience, data from each combination
of conditions were divided chronologically into a first and last half for each run, and each half
averaged to produce a mean RMS error. Thus, the data of Figure 4 produced four mean RMS
altitude values and four mean RMS heading values, two for OZ and two for CD. Panel (a) of
Figure 5 shows mean RMS altitude error; panel (b) shows mean RMS heading error. The
horizontal axis of each panel is divided into the first half and the second half of the run.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
82

Turbulence condition is indicated for each of these halves. Error bars are standard error of the
mean.
---------------------------------------------

Fig. 5a: Altitude performance of the volunteer (as RMS error in feet) for each of the four
turbulence levels (none, light, moderate, and severe) over the first and second halves of the run,
for CD and for OZ display.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
83

Fig. 5b: Heading performance of the volunteer (as RMS error in degrees) for each of the four
turbulence levels (none, light, moderate, and severe) over the first and second halves of the run,
for CD and for OZ display.

---------------------------------------------
To quantitatively evaluate the data shown in Figure 5, ANOVAs were performed (15).
The ANOVAs for altitude (F = 5.642, MSe = 9803.741, df = 3,746; p < 0.0008) and heading (F =
17.579, MSe = 5836.413, df = 3,746; p < 0.0001) showed a statistically significant three-way
interaction of display by turbulence by flight experience for both altitude and heading mean
RMS. Post hoc analysis (Newman-Keuls test procedures) revealed the general pattern of
differences among the test conditions. The important comparison was between OZ and CD. As
can be seen in Figure 5, with OZ neither experience nor turbulence had any impact on mean
RMS error for altitude or heading. On the other hand, with CD there was a relatively
complicated interaction between turbulence level and the experience of the participant. The
analysis of data obtained from the other participant yielded a similar pattern.
Discussion
The results showed that the participants with minimal training and no previous flight
experience executed a simple flying task using OZ with greater consistency and far greater
precision than when using CD. Turbulence had no impact on OZ performance although
turbulence radically degraded performance with CD. OZ was easier to learn than CD;

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
84

performance was still improving with CD whereas with OZ performance did not change but
remained consistently superior.
Experiment 2: Dual Task Performance
Experiment 1 suggested that OZ supports precise aircraft control and situational
awareness even under severe turbulence. Therefore, we put OZ to a most extreme test; one that
is operational important but patently impossible with CD. We thought that reading text out loud
while flying would be a demanding task that would dramatically demonstrate that OZ enables the
pilot to attend to other mission relevant tasks, e.g., map reading, communications, targeting,
check lists, etc., while maintaining control of the aircraft in instrument flight.
Method
Participants: The same two medical students who volunteered for Experiment 1
volunteered for Experiment 2.
Equipment: The same instrumentation was used as was used in Experiment 1. However,
OZ software was modified so that text could be presented at the rate of one word per second in a
centered, blacked-out circular patch of approximately one-inch diameter. There were no other
changes made in the instrumentation.
Task: The task was to fly due south at 100 knots at 3000 feet under the same four levels
of turbulence that were used in Experiment 1. There were three display conditions. The first
two were essentially replications of Experiment 1, flying with OZ and CD. The third condition
was to fly with OZ while reading out loud the text presented in the middle of the screen.
Independent Variables: In a two factor experiment, three levels of flight display (CD, OZ,
and OZ with reading) were compared for the 4 levels of the default turbulence that the Elite
Simulator provided (none, low, moderate, and severe).
Training: The two medical student volunteers had learned during Experiment 1 to use
OZ, CD, and the data collection methodology and procedures. The participants practiced the
current task before initiating the data collection session.
Data Reduction: The data reduction methods were identical to those used in Experiment
1.
Results
Figure 6 shows one of the participant’s average flight performance under four levels of
turbulence and three display conditions are shown as mean RMS altitude error (feet) in panel (a)
and mean RMS heading error (degrees) in panel (b). Error bars are standard error of the mean.
On the left side of each panel is flight performance with conventional instruments without the
secondary reading task. In the middle of each panel is flight performance with OZ without the
secondary reading task. On the right side of each panel is flight performance with OZ while
performing the secondary reading task.
---------------------------------------------

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
85

Fig. 6a: Altitude performance of the volunteer (as RMS error in feet) for each of the four
turbulence levels (none, light, moderate, and severe) for the CD (left group of histograms), OZ
(middle group of histograms) and OZ while reading (right group of histograms).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
86

Fig. 6b: Heading performance of the volunteer (as RMS error in degrees) for each of the four
turbulence levels (none, light, moderate, and severe) for the CD (left group of histograms), OZ
(middle group of histograms) and OZ while reading (right group of histograms).
---------------------------------------------
With CD, performance degraded with increased turbulence, as shown by the RMS error
values on the y-axis; the larger the RMS, the poorer performance. Note that for the task of flying
with CD, the participants did not perform the secondary reading task; their only responsibility
was to control the aircraft. The secondary task of reading was simply impossible for the
participants with CD. Performance with OZ without the secondary task replicated Experiment 1,
but the participants were much more experienced.
ANOVAs for altitude (F = 4.542, MSe = 2480.073, df = 11, 308; p < 0.0001) and heading
(F = 38.8159, MSe = 576.668, df = 11, 308; p < 0.0001) and Post Hoc tests (LSD) showed that
while turbulence did significantly degrade performance with CD, it did not degrade performance
with OZ with or without the additional reading task. This difference between OZ and CD for
turbulence is clearly shown in Figure 6. Particularly noteworthy is the absence of a statistically
significant difference between OZ flight performance with and without the secondary the reading
task. This finding is also visually evident in Figure 6.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
87

Discussion
The most important conclusion of this experiment is that the secondary reading task (with
OZ) did not impact flight performance in the least. OZ enabled control of the simulated aircraft
while reading, regardless of turbulence.
One of the design goals of OZ was to scale the display so that it would support precision
flight equal to the best obtainable with CD. Optimal precision flight with CD can be expected
when a trained operator’s sole task is flying straight and level in smooth air without the burden
of secondary tasks (radio, map reading, etc.). Data for these ideal conditions was collected in
Experiments 1 and 2 and are shown in the no turbulence conditions from the second half of
Experiment 1 (Figure 5) and for Experiment 2 (Figure 6). Here it is evident that the design goal
of OZ to equal the best flight performance obtained with CD was achieved.
General Discussion
The two experiments reported here evaluated a revolutionary new approach to the display
of flight information. The goal was to find out whether performance with OZ was at least
comparable with that supported by conventional instrumentation. However, the results suggest
that far stronger claims can be made. OZ enabled the students to develop accuracy more quickly
than the CD. We reported only the RMS scores rather than the component variability and
displacement (bias) scores, a practice not uncommon in the literature (38). However, our
conclusions are completely consistent with the conclusions based on the variability and
displacement scores; OZ enabled significantly more precise flight than the conventional
instruments.
It might be argued that the precision of flight might not be a meaningful performance
parameter. After all, the difference in altitude RMS error of between 5 feet and 15 feet at 3000
feet may not be important in a practical sense. However, the specific task we used, precise flight
measured by RMS error, should actually favor conventional instruments composed of dials,
gauges, arrows, and pointers. In other words, the experiments stacked the cards against OZ in
favor of the conventional display.
To our knowledge, OZ is the only cockpit environment that, while avoiding the
conventional reliance on alphanumerics and the tools of dials, gauges, arrows, and pointers,
supports total flight performance that is as precise, or even more so, than that obtained with the
conventional instruments. (In addition to altitude and heading, OZ also indicates attitude,
configuration, power, radio navigation, and location management.)
An important feature of OZ is its one-to-one mapping on the two-dimensional CRT of
360° horizontal and 60° vertical air space. The OZ star field map is accomplished with a high
degree of precision that is not confusing, while preserving perspective and distance. Because of
the power of OZ’s unique representation of airspace, we expect OZ will demonstrate superiority
in supporting situational awareness and spatial orientation. These are aspects of flight
performance for which the ‘ecological’ displays that use graphic and iconic elements have
demonstrated superiority over the conventional pointers and arrows (6, 37).
OZ was specifically designed to reduce, if not eliminate, the need to scan separate flight
instruments. There are reports in the literature of previous efforts to accomplish this by using so-
called peripheral vision display strategies (3, 4, 8, 24, 36). All such efforts met with limited
success, at best. The approach OZ uses is different; it is not a peripheral vision display. OZ
presents the information in a graphical fashion. As such, the information is processed by the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
88

human visual system at the speeds it uses to process images, speeds faster than those required to
foveate and read dials and gauges and integrate numerical data.
OZ is an invention with functional characteristics that we are discovering as we continue
to study it. For example, we have recently found that trained pilots can actually fly two
simulators simultaneously under severe turbulence conditions, executing different maneuvers
simultaneously, each simulator with its own OZ display. This demonstration strongly suggests
that OZ can be considered to be a single instrument that integrates all the information needed to
fly the aircraft. This observation and the results of the experiments reported here have strong
implications for the development of remotely-piloted vehicles, while also having theoretical
implications for the pilots' mental model of an aircraft’s situation. We anticipate further
discoveries about OZ and its human users, and we are exploring additional applications of OZ.
For example, OZ may be ideal for use in aircraft whose capabilities change in a dynamic fashion,
such as tilt-rotor aircraft. We are also exploring the possibility that OZ design principles may
find a role in the design of information display for use in contexts other than aviation.

References
1. Bain, L. J., and Engelhardt, M. (1992). Introduction to probability and mathematical
statistics (2nd. Ed.). Belmont, CA: Duxbury Press.
2. Bergen, J. R.: Theories of visual texture perception. In. Spatial Vision. Ed. David Regan,
Vol. 10, Vision and Visual Dysfunction General Editor J. R. Cronley-Dillon. CRC
Press., Inc., Boca Raton, 1991, 71-92.
3. Beringer, D. B. & Chrisman, S. E. (1991) Peripheral polar-graphic displays for
signal/failure detection. International Journal of Aviation Psychology, 1, 133-148.
4. Brown, I. D., Holmqvist, S. D., and Woodhouse, M. C. (1961). A laboratory comparison
of tracking with four flight-director displays. Ergonomics, 4, 229-251.
5. Caputo, G.: The role of the background: texture segregation and figure-ground
segmentation. Vision Res 1996 Sep; 36(18):2815-2826.
6. Flach, J. M., and Warren, R. (1995). Low altitude flight. In J. M. Flach, P. A. Hancock, J.
K. Caird, & K. J. Vicente (Eds.), An ecological approach to human machine systems.
Hillsdale, NJ: Erlbaum.
7. Flanagan, J. L., Huang, T. S., Jones, P. and Kasif, S.: Final Report of the National
Science Foundation Workshop on Human-Centered Systems: Information, Interactively,
and Intelligence (HCS)., Hosted by Beckman Institute for Advanced Science and
Technology, University of Illinois at Urbana-Champaign July, 1997 Arlington, VA.
8. Fenwick, C. A. (1963). Development of a peripheral vision command indicator for
instrument flight. Human Factors, 5, 117-128.
9. Goodenough, B., Gillam, B.: Gradients as visual primitives. J Exp Psychol Hum Percept
Perform 1997 Apr; 23(2):370-387.
10. Harris, L. R., Christhilf, D. M. (1980). What do pilots see in displays? Presented at the
Human Factors Society Meeting, Los Angeles, CA.
11. Harris, L. R., Glover, B. J., Spady, A. A. (1986, July). Analytic techniques of pilot
scanning behavior and their application. NASA Technical Paper 2525. Moffett Field, CA:
NASA-Ames Research Center.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
89

12. Hoffman, R. R., Ford, K. M., and Coffey, J. W. (2000). "The Handbook of Human-
Centered Computing." Report, Institute for Human and Machine Cognition, University
of West Florida, Pensacola FL.
13. Hogervorst, M. A., Kappers, A. M., Koenderink, J. J.: Structure from motion: a
tolerance analysis. Percept Psychophys 1996 Apr; 58(3):449-459.
14. Hubbard, D. (1987). Inadequacy of root mean square error as a performance measure. In
Procedings of the International Symposium on Aviation Psychology (pp. 698-704).
Columbus OH: Ohio State University.
15. Johnson, R. A., Wichern, D. W.: Applied Multivariate Statistical Analysis. Prentice Hall,
Upper Saddle River, N.J.: 1998
16. Julesz, B.: Figure and ground perception in briefly presented isodipole textures. In:
Perceptual Organization, Eds.: M. Kubovy and J. Pomerantz, Lawrence Erlbaum
Associates, Hillsdale, NJ 1981.
17. Kastner, S., Nothdurft, H. C., Pigarev, I. N.: Neuronal correlates of pop-out in cat striate
cortex. Vision Res 1997 Feb; 37(4):371-376.
18. Kershner, W. K.: The Instrument Flight Manual (4th Edition) Iowa State University
Press/Ames. 1991.
19. Lamme, V.A.: The neurophysiology of figure-ground segregation in primary visual
cortex. J Neurosci 1995 Feb; 15(2):1605-1615.
20. Leibowitz, H.: Post, R. B.: Two modes of processing concept and some implications. In:
Organization and Representation in Perception. Ed.: J. Beck, Erlbaum, 1982.
21. Leibowitz, H., Shupert, C.L.: Low luminance and spatial orientation: In Proceedings of
the Tri-Service Aeromedical Research Panel Fall Technical Meeting, Naval Aerospace
Medical Research Laboratory, Pensacola, Fl; NAMRL Monograph-33, 1984, 97-104.
22. Leibowitz, H., Shupert, C.L. and Post (1984). The two modes of visual processing:
Implications for spatial orientation. In Peripheral Vision Horizon Display (PVHD),
NASA Conference Publication 2306 (pp. 41-44). Dryden Flight Research Facility, NASA
Ames Research Center, Edwards Air Force Base, CA.
23. Lind, M.: Perceiving motion and rigid structure from optic flow: a combined weak-
perspective and polar-perspective approach. Percept Psychophys 1996 Oct; 58(7):1085-
1102.
24. Malcolm, R. (1984) Pilot disorientation and the use of a peripheral vision display.
Aviation, Space, and Environmental Medicine, 55, 231-238.
25. Naval Air Training Command, Flight Training Instruction TH-57, Helicopter Advanced
Phase, CNATRA P_457 New(08-93) PAT; NAS Corpus Christi, TX, 1993.
26. Ocker, W. C., and Crane, C. J.: Blind Flight in Theory and Practice. The Naylor Co, San
Antonio, TX 1932
27. Pollick, F. E.: The perception of motion and structure in structure-from-motion:
comparisons of affine and Euclidean formulations. Vision Res. 1997 Feb; 37(4): 447-
466.
28. Post, R. B., Leibowitz, H.,W.: Two modes of processing visual information: implications
for assessing visual impairment. Am J Optom Physiol Opt 1986 Feb; 63(2): 94-96.
29. Siddiqi, K., Tresness, K. J., Kimia, B.B.: Parts of visual form: psychophysical aspects.
Perception 1996; 25(4): 399-424.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
90

30. Simoneau, G. G., Leibowitz, H. W., Ulbrecht, J. S., Tyrrell, R. A., Cavanagh, P. R.: The
effects of visual factors and head orientation on postural steadiness in women 55 to 70
years of age. J Gerontol 1992 Sep; 47(5) M151-M158.
31. Temme, L. A., Chapman, F. A., Still, D. L., and Person, P. C. (1999). The performance of
the standard vertical S-1 flight (VS-1) maneuver by student navy helicopter pilots in a
training stimulator. Abstract, Aviation, Space and Environmental Medicine, 70, p. 428.
32. Temme, L. A., Woodall, J., Still, D. L.: Calculating A Helicopter Pilot's Instrument Scan
Patters From Discrete 60 Hz Measures Of The Line-Of-Sight: The Evaluation Of An
Algorithm. Paper in Review.
33. Thibos, L. N., Still, D. L., Bradley, A.: Characterization of Spatial aliasing and contrast
sensitivity in peripheral vision. Vision Research, 36, 249-258, 1996.
34. Thibos, L. N., Bradley, A.: Modeling off-axis vision-II: The effects of spatial filtering
and sampling by retinal neurons. In Vision Models for Target Detection and Recognition:
World Scientific Press, Singapore, 1995.
35. Turano, K., Herdman S. J., Dagnelie, G.: Visual stabilization of posture in retinitis
pigmentosa and in artificially restricted visual fields. Invest Ophthalmol Vis Sci 1993
Sep; 34(10): 3004-3010.
36. Vallerie, L. L. (1966) Displays for seeing without looking. Human Factors, 8 , 507-513
37. Warren, R.: Preliminary questions for the study of egomotion. In R. Warren & A.H.
Wertheim, (Eds). Perception and control of self motion (pp 3-32) Hillsdale, N.J.
Lawrence Erlbaum Associates.
38. Weinstein, L.F. & Wickens, C.D. (1992) Use of nontraditional flight displays for the
reduction for central visual overload in the cockpit. International Journal of Aviation
Psychology.
39. Williams, D. R., Coletta, N. J.: Cone spacing and the visual resolution limit. Journal of
the Optical Society of America, A, 4, 1514-1523 1988.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
91

PERCEPTUAL DRIFT RELATED TO SPATIAL DISORIENTATION


GENERATED BY MILITARY SYSTEMS :
POTENTIAL BENEFITS OF SELECTION AND TRAINING
Corinne Ciana, Jérôme Carriotb & Christian Raphela
a
Département des facteurs humains, Centre de Recherches du Service de Santé des Armées,
Grenoble, France.
b
UPR-ES 597 Sport et Performance Motrice, Université Joseph Fourier, UFRAPS, Grenoble,
France.

The development of new technologies in the weapon systems generate sensory flows
that can induce sensory conflicts and dysfunction. The consequences can be pathological
disorders but beside these extreme cases, the most common consequence is spatial
disorientation. Spatial disorientation is characterized by the failure of the operator to sense
correctly the position or motion of an object or himself within a fixed coordinate system
provided by the surface of the earth and the gravitational vertical. For example spatial
disorientation may induce a misperception of the location of a visual target resulting in a
perceptual drift. This phenomenon may be related to the functional properties of the Central
Nervous System. The object location requires information about the body orientation of the
subject. Then the perceived location depend on visual vestibular and somaesthetic
information. Thus, when the relationship between an observer and the gravitational frame of
reference is altered as when subject is tilted with respect to the gravity or when the magnitude
or direction of the gravity change as often occurs in accelerating vehicle the apparent
locations of seen objects are usually altered (Cohen, 2002).
Most of these problems has been studied in the field of aviation and concerned large
gravito-inertial forces. However, perceptual drifts have been observed for lower gravito-
inertial forces generated by antiaircraft guns. In this system the subject rotated and or is tilted
together with the system. These very low body rotations unconsciously affect the spatial
perception of a target. This perceptual drift may be related to the oculogravic illusion
phenomenon already observed in operational aviation environments.
To study this illusion in a laboratory we generally ask the subject to determine whether
a given target is above or below the level of his eyes. For an upright subject, a target is
considered to be at eye level when an imaginary line connecting the target to the eyes is
perpendicular to the direction of gravity. The angular deviation between the visual target set to
appear at eye level and this horizontal plane defines the visually perceived eye level (Dizio et
al., 1997; Matin et al., 1992; Li et al., 1993). The perceived eye level is strongly influenced by
the variation of the gravitational-inertial forces acting on the subject. This is the case when an
upright subject faces toward the center of a centrifuge that rotates at a steady velocity for
some time. A target that remains at true eye level appears to be above its true location. The
oculogravic illusion induces a lowering of the visually perceived eye level (Cohen, 1973;
Cohen et al., 2001; Graybiel, 1952; Whiteside et al., 1965). This illusion is explained by an
illusory perception of body tilt in pitch related for higher gravito-inertial force changes to a
mechanical action on the otolithic organs of the vestibular system, as well as of the muscle
and cutaneous proprioceptors (Cohen, 1973; Wade et al., 1971), whereas bodily senses
affected by very limited variations of G are restricted, the lowering of the perceived eye level
is probably due to the stimulation of the otolithic system alone (Raphel et al., 1994, 1996).
In the range of very low gravitational-inertial stimulation, there were large individual
differences. The gravitational-inertial disturbances did not induce the same negative

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
92

consequences on the eye level for all subjects. Some of them were not subjected to illusion,
others showed a smaller sensibility to the oculogravic illusion that is a smaller perceptual
drift. These individual differences could find an explanation in the sensibility with which the
subjects perceive a variation of sensory state. This sensibility would depend on the
comparison of the signals related to body orientation with internal models which specify
expected sensory configurations (Berthoz et al., 1999). These internal representations were
elaborated from the subjects’ experience (Lackner, 1992; Young et al., 1996). Their adequacy
to an environmental reality would be then different from one individual to another.
EXPERIMENTAL DESIGN
To investigate to which extent spatial experience modifies the oculogravic illusion, we
study the effect of limited variations of G with two different populations differently skilled in
gymnastics: 20 subjects practiced trampoline at a national or international level (expert group)
and 20 subjects with no special sport expertise (control group). Acrobats were chosen because
they are trained to face high postural constraints and their activity requires to associate finely
unusual sensory configurations with a precise body orientation. Unconsciously, they would
have improved the functional characteristics of the sensory systems and learned perceptual
strategies.
The subjects were seated in a centrifuge facing towards the axis of rotation or were
laying on the rotating horizontal plane which rotated around the chest to spine axis of the
supine subjects (figure 1). The supine position correspond to the maximum tilt in pitch of the
subject on the antiaircraft gun. In this condition, the subjective perception of the eye level
corresponds to the plane as being parallel to the direction of gravity (zenith). They were asked
to set a luminous target at the place perceived as the eye level (or zenith) while they were in
total darkness and undergoing very low centrifugation less than 1.01G. For each gravitational-
inertial condition, the visually perceived eye level was averaged over the trials. The mean
value measured while the subject was motionless and in total darkness served as a reference
value. Experimental data expressed in degrees of visual angle in different centrifugation
conditions consisted of the algebraic difference between the perceived eye level measured
under a gravitational-inertial condition and the reference value. When the difference was
negative, the perceived eye level was below the reference value and when it was positive, the
perceived eye level was above the reference value.

Horizontal plane γr
Vertical plane

GIF G

Figure 1: The apparatus consisted of a centrifuge in which the subject could take place in the
upright position (left panel) or in the supine position (right panel ). The illustration shows the
the initial position of the target, the horizontal and vertical plane through the eyes and the
vectorial sum GIF between G and radial acceleration γr under centrifugation.
RESULTS
In the seated position (figure 2) and for the control subjects, increasing the radial acceleration
lowered the perceived eye level - that is a shift upward toward the lower part of the body as

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
93

indicated by the negative values. The visual drift become lower than the reference value with
a modification of G which is less than 0.01 percent. For the expert subjects, The theoretical
threshold of sensitivity to radial acceleration is higher. Only settings obtained under the higher
value of radial acceleration were lower than the reference that is 1.02G. Moreover, the
lowering of the eye level when it happens was much smaller than for the control subjects.
Radial acceleration γr (m.s-2)
0,0152 0,0609 0,38077 1,67
2

0 VPEL reference
Mean deviation (deg)

-2

-4
*
-6 *
Controls
-8 Experts

Figure 2: Mean deviation (deg) of the visually perceived eye level (VPEL) settings relative to
the reference (motionless) for the control and expert groups and for the different conditions of
radial acceleration.

In the supine position (figure 3) and for the control subjects, the zenith perceived under each
centrifugation condition was lower than the reference (motionless) and increasing the radial
acceleration lowered the eye level. The expert group was not sensitive to low gravitational-
inertial stimulation up to 1.04 G which represent a tilted angle of the gravitoinertial force of
17 degrees relative to G. The zenith perceived under each centrifugation condition was not
different from the reference motionless.
Radial acceleration γr (m.s-2)
0.55 0.97 1.52 2.19 2.98
2

0 Zenith reference
Mean deviation (deg)

-2

-4
*
-6

-8
Controls
Experts
Figure 2: Mean deviation (deg) of the zenith settings relative to the reference (motionless) for
the control and expert groups and for the different conditions of radial acceleration.

DISCUSSION
Thus the gravitational-inertial disturbances did not induce the same negative consequences on
the eye level for spatial experts. Whatever the orientation of the body in space, the control
subjects set the target progressively lower as the magnitude of the gravito inertial force
increased. In the upright position, the spatial experts showed a smaller sensibility to the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
94

oculogravic illusion observed at the level of the theoretical threshold of radial accelerations
sensitivity but also at the level of the oculogravic illusion scale. In the supine position,
acrobats were not subjected to illusion
The origin of these inter-individual differences may be a better efficiency in the use of the
sensory information. When limited variations of G are used only the vestibular system was
stimulated. However the otolithic sensors do not distinguish a linear acceleration of a head tilt,
and the central nervous system prefer to interpret the gravitational modifications in term of
physical tilt in pitch. Then a target that remains at true eye level appears to be above its true
location and the oculogravic illusion induces a lowering of the visually perceived eye level. It
can be suggested that the spatial experts were less sensitive to oculogravic illusion because of
their otolithic capacity to distinguish linear accelerations from an head-body tilt. Through
learning, new configurations corresponding to postural states would be stored at the level of
the central nervous system enriching the internal models of orientation. They could have
develop internal models at the origin of this perceptive distinction (Merfeld et al., 1999).
Beside this interpretation the origin of the individual differences may be related to a sensorial
weighting with respect to subjects’ experience. In absence of vision, the perception of body
orientation in the gravitational field is based on the coordination of vestibular and
somaesthetic sensory modalities. The somaesthetic system is assumed to provide information
about body orientation, notably in response to the anti-gravitational forces. Most of the time,
the information provided by these two systems (vestibular and somaesthetic) is redundant.
Then, the errors of perceptive judgment would depend on an inadequate resolution of a
sensory conflict generated by the presence of non-redundant information, implying an
inappropriate sensory dominance during the integration process (Young, 1984). In that context
the eye level lowering observed for the very low gravitational-inertial stimulation may be the
result of a sensory conflict between the otolithic information, for which the sensors do not
distinguish a linear acceleration of a head tilt, and the somaesthetic information, which
indicates no change in the postural state with regard to the gravity. The perceptive shift would
be then associated to a process of sensory integration which gives more weight to the otolithic
information and interprets the gravitational modifications in term of physical tilt. Conversely,
the absence of oculogravic illusion may be the result of a somaesthetic sensory dominance.
In conclusion, the relations the subject maintains with the spatial environment and the
knowledge acquired through experience modify the processing of sensory information and the
perceptive construction resulting from it. The extensive practice of acrobatics which requires
to finely associate sensory configurations with a precise physical orientation, allows these
spatial experts to be less sensitive to the oculogravic illusion stemming from radial
accelerations similar to those met during the daily life.
REFERENCES
Berthoz, A. & Viaud-delmon, I. (1999). Multisensory integretion in spatial orientation.
Current Opinion in Neurobiology, 9, 708-712.
Cohen, M. (1973). Elevator illusion : influences of otolith organ activity and neck
proprioception. Perception and Psychophysics, 14, 401-406.
Cohen, M. (2002). Visual and vestibular determinants of perceived eye-level. RTO-MP-086:
Spatial Disorientation in Military Vehicules: Causes, Consequences and Cures (pp. 37-1 to
37-8).
Cohen, M., Stopper, A., Welch, R. & DeRochia, C. (2001). Effects of gravitational and optical
stimulation on the perception of target elevation. Perception and psychophysics, 63, 29-35.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
95

Dizio, P., Li, W., Lackner, J.R. & Matin, L. (1997). Combined influences of gravito-inertial
force level and visual field pitch on visually perceived eye level. Journal of Vestibular
Research, 7, 381-392.
Graybiel, A. (1952). Oculogravic illusion. Archives of Ophtalmology, 8, 605-615.
Lackner, J.R. (1992). Multimodal and motor influences on orientation : implications for
adapting to weightless and virtual environments. Journal of Vestibular Research, 2, 307-322.
Li, W. & Matin, L. (1993). Eye & head position, visual pitch, and perceived eye level.
Investigative Ophthalmology & Visual Science, 34, 1311.
Matin, L. & Li, W. (1992). Visually perceived eye level : changes induced by pitch-from-
vertical 2-line visual field. Journal of Experimental Psychology, 18, 1, 257-289.
Merfeld, D. M., Zupan, L. & Peterka, R. J. (1999). Humans useinternal models to estimate
gravity and linear accelaration. Nature, 398, 615- 161.
Raphel C., Barraud P.A. (1994). Perceptual thresholds of radial acceleration as indicated by
visually Perceived Eye Level. Aviation, Space and Environmental Medicine, 65, 204-208.
Raphel C., Cian C., Barraud P.A. & Micheyl C. (2001). Effects of supine body position and
low radial accelerations on the visually perceived apparent zenith. Perception &
Psychophysics, 1, 36-46.
Wade, N.J. & Schöne, H. (1971). The influence of force magnitude on the perception of body
position : I. Effects of head posture. British Journal of Psychology, 62, 157-163.
Whiteside, T.C.D., Graybiel, A., & Niven, J.I. (1965). Visual illusions of movement. Brain,
88, 193-210
Young, L.R. (1984). Perception of the body in space : Mechanisms. In I. Smith (Ed.),
Handbook of Physiology - The nervous system ; vol. 3 (pp. 1023-1066). New-York :
Academic Press.
Young, L.R., Mendoza, J.C., Groleau, N. & Wojcik, P.W. (1996). Tactile influences on
astronaut visual spatial orientation: human neurovestibular studies on SLS-2. Journal of
Applied Physiology, 81, 44-49.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
96

PERCEPTUAL DYSLEXIA:
ITS EFFECT ON THE MILITARY CADRE AND
BENEFITS OF TREATMENT
Susann L. Krouse
Naval Education and Training
Professional Development and Technology Center
Pensacola, FL, USA

James H. Irvine
Naval Air Warfare Center, Weapons Division
China Lake, CA, USA

Perceptual dyslexia—also known as Irlen Syndrome, Scotopic Sensitivity Syndrome,


SSS, scotopic sensitivity/Irlen syndrome, and, in the United Kingdom, Meares-Irlen
Syndrome—is a perceptual disorder that affects an estimated 46-50 percent of those with
learning disabilities or reading problems; 33 percent of those with dyslexia, attention deficit
(hyperactivity) disorder, and other behavior problems; and approximately 12-14 percent of
the general population (Irlen, 1999). It is not a dysfunction of the physical process of sight.
People with perceptual dyslexia can have 20/20 vision or they may wear corrective lenses.
Perceptual dyslexia is a problem with how one’s nervous system encodes and decodes visual
information and transmits it to the visual cortex of the brain (Warkentin & Morren, 1990).

SYMPTOMS OF PERCEPTUAL DYSLEXIA

People who are affected with perceptual dyslexia have problems accommodating
specific wavelengths of light, and each person’s troublesome frequency is unique. Factors
such as bright light, fluorescent light, high-gloss paper, and black-and-white contrast can
aggravate the disorder. The victim’s scope of focus may be restricted so that he or she may
only see very small bits of a line of text instead of the entire line. The text that the person
sees might blur, swirl, move, pulsate, vibrate, or even disappear. The white page is too
bright; or it may flicker or flash; or colors may appear. SSS victims rarely report these
symptoms to others because they think that everyone experiences the same problems (Irlen,
1991). Those with perceptual dyslexia often avoid reading at all costs, and, as a result, they
may be affected physically, academically, and psychologically (Irlen, 1991).

From a physical standpoint, because of the text distortions suffered, reading becomes
extremely difficult, often physically painful. Without intervention, victims of Irlen
Syndrome exhibit symptoms such as sensitivity to light, headaches, nausea, eyestrain,
sleepiness while reading, attention deficit, and distortions of text (Irlen, 1991).

Academically, everything derives from reading, and victims of Irlen Syndrome


invariably find it difficult to read. They may skip words or reverse or change letter order—
seeing the word “saw” as “was,” for instance. They may have poor penmanship, a result of
difficulty with spatial orientation: they misjudge how much space to leave between a pair of
letters or words. Because they frequently can’t envision an entire word, they find it difficult
to spell or work with large numbers (Irlen, 1991).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
97

Psychologically, the victim of perceptual dyslexia is prone to exhibit problems with


behavior, motivation, and self-esteem. Those with SSS frequently exhibit symptoms of
attention deficit disorder, acting out, and behavior problems (Irlen, 1991). They are often
poorly motivated to succeed. Almost invariably they tried early on, when they were young.
But with few successes and many “failures,” their attitude became “why bother?” Their self-
esteem is low because, while everyone around them is reading and learning, they cannot—no
matter what they do or how hard they work, they just can’t seem to “get it.”

Identification of SSS

Helen Irlen, a literacy instructor in California, first identified this perceptual dyslexia
in the early 1980s and labeled it “scotopic sensitivity syndrome.” Irlen had received a grant
from California State University, Long Beach, in 1980, to set up a literacy program for
adults. She chose to work with adults because adults can communicate better than children
and are more accurate “reporters” of what they experience; they are less intimidated by
authority than children and are less likely to he swayed without some evidence; and adults
are more motivated to succeed. They have reached a point in their lives where they
recognize the importance of learning in general and reading in particular.

After three years of in-depth research, Irlen discovered that many problems appeared
after readers had been actively reading for a relatively short period of time (usually about 10
minutes or more). Those who had trouble reported that distortions began to appear on the
page, and those distortions prevented them from comprehending the words. All of their
energy was going into perceiving the words, holding them on the page, or even just finding
them! As a result, many stopped reading. It was just too difficult for them. As Irlen
explained in her speech at the dyslexia Higher Education Conference, October 31-November
2, 1994, at Plymouth University, England, once she began asking the more definitive
question, “WHAT do you see?” instead of “DO you see?” the answers made it apparent to
her that these poor readers were victims of a unique syndrome that was not being adequately
addressed by the professional educational community. (Dyslexia in higher education:
strategies and the value of asking).

Serendipitous Discovery

One day, one of Irlen’s students discovered that when she placed a red overlay—left
over from previous eye-dominance exercises—on the page she was reading, the sensation of
movement that she had always experienced stopped! For the first time, she could actually
read without having the words constantly sway back and forth! (Irlen, 1991) The red didn’t
work for everybody, however. It made no difference to the rest of the students.

So, Irlen tried other colors and found that the vast majority of those who tried the
colored overlays were helped. Each person who was helped responded to one specific color.
Once that particular color was determined and used, the individual was able to read better
and longer and reported that the distortions previously experienced disappeared immediately.
Irlen didn’t know at that time why the overlays worked, just that they did.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
98

RESEARCHING THE CAUSE

With the advent of magnetic resonance imaging, we’ve been able to determine that
the brains of all dyslexics—including perceptual dyslexics—work differently than those of
non-dyslexics. (Lewine, et al., in press) Dyslexics use a different part of the brain than non-
dyslexics when they read, and they use a larger portion of their brain when they read or
perform visual tasks.

Receptor Field Theory

In the 1980s, visual physiologists developed the receptor field theory of color vision.
This theory hypothesizes that the cones of the eyes are organized into eight sets of
concentric, counterbalancing fields. Cones, of course, help us distinguish things clearly and
distinctly. Because they contain photopigments that are sensitive to red, green, and blue light
wavelengths, we are able to see color. (Irvine, 2001)

Each type of field is determined by the field’s color region arrangement and the
balance of the output of each field’s energy or signal. The output should be equal—that is,
neither positive nor negative—as it passes through the optic nerve to enter the brain’s visual
processing center. (Irvine, 2001)

If the receptor fields are summed to a unity as they enter the brain’s processing
center, and each single receptor field is equal to the others (so that none is governing or
dominant), there will be no perceptual distortion, and the image formed will be accurate. On
the other hand, if any of the receptor fields does not sum to a unity or is, in fact, dominant
under a set of spectral input conditions, the visual control system will change, and the image
formed will overlap, swirl, jump about—generally be distorted. (Irvine, 2001)

The Pathways to the Visual Cortex

Over the years since Irlen’s discovery, numerous studies of this visuo-perceptual
disorder have been conducted, and the general consensus is that scotopic sensitivity
syndrome affects the way the visual pathways carry messages from the eye to the brain.

There are two pathways to the visual cortex:

1. the magnocellular, which does fast processing of information for perceiving


position, motion, shape, and low contrast; and

2. the parvocellular, which carries out slower processes for perceiving still images,
color, detail, and high contrast.

It is theorized that when the receptor fields do not sum to unity, the pathways are
affected, causing the magnocellular impulses to be slowed, so only partial perception occurs.
This results in words that blur, fuse, or seem to jump off the page (Newman, 1998).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
99

Individualized colored filters seem to return the balance between the two processing systems,
preventing this overlapping (Robinson, 1994). The colored overlays and filters cut down or
eliminate the perceptual problem by screening out the wavelengths of light troublesome to
the individual (Sims, 1999). Studies of both the long- and short-term efficacy of the
transparencies and filters have shown that they do, indeed, provide benefits to the individual
afflicted with SSS (Whiting, Robinson, & Parrott, 1990; Robinson & Conway, 1990).

THREE STUDIES

Although there have been numerous studies into perceptual dyslexia since its
recognition in 1983, we will look at just three in this paper: Irvine, Lewine, and Wilkins.

Irvine’s Experiment for the Navy

The Navy wanted to see if the visual performance of those afflicted with perceptual
dyslexia changed as the energy spectrum presented to them changed. Therefore, in 1995,
James Irvine conducted an experiment at China Lake, California, that showed that for certain
perceptual dyslexics the receptor fields do NOT sum to unity, so the image sent to the brain
is not crisp and clear. When this happens, the subject’s visual control system alters radically,
so the subject does not see the image properly. (Irvine & Irvine, 1997)

Lewine’s Study

In the late 1990s, Dr. Jeffrey Lewine, a neuroscientist then at the University of Utah
Center for Advanced Medical Technologies, discovered that modifying the light frequency
spectrum that went to a perceptual dyslexic’s vision system could make the brain alter and
revert to a more normal brain pattern. He also noted that he could actually cause five to six
percent of the “normal” population to develop dyslexic-type dysfunction when they were
exposed to “abnormal” light frequency environments. (Lewine, et al., in press) This means
that some ordinarily non-dyslexic personnel can develop gross inefficiency, degraded
performance, and/or become dysfunctional and unable to perform normally under certain
lighting conditions such as red battle lighting, blue precision operating bays, or in foggy or
hazy conditions.

Wilkins’ Studies

Professor Arnold Wilkins, while a research scientist at the Medical Research Council
Applied Psychology Unit of Cambridge University in the United Kingdom, studied the
neuropsychology of vision, reading and color, photosensitive epilepsy, and attention,
conducting double-blind experiments to validate the existence and potential treatment of
perceptual dyslexia. He did this using four different groups of readers, mostly children,
randomizing the presentation order of the overlays, and further randomizing the use of the
appropriate overlays versus placebo overlays. (Wilkins, 2003)

Wilkins’ studies determined that, when given the choice, about half the readers would
choose clear overlays, and the other half would choose the colored overlays. Given that only

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
100

approximately 15 percent of the population is afflicted with perceptual dyslexia, we can


assume from Wilkins’ experiments that, in addition to these people being helped by the
colored overlays or filters, some of those not-so-afflicted can also benefit from color!

TESTING AND TREATMENT

Generally speaking, before we can treat perceptual dyslexics, we have to identify


them.

Types of Screening

There are generally three types of screening that would be used, two of which are
based on the Irlen Method:

1. In the field or at the recruiting site, a simple, 10- to 15-question inquiry of the
subject, and trial-and-error determination of the appropriate colored overlay.

2. At the Recruit Training Center or a major command, an in-depth inquiry


consisting of questions concerning the subject’s symptoms and related history.

The third test, the Wilkins Rate of Reading Test, is also easy to administer and
consists of four easy one-minute tests. The entire process should not take more than about a
half hour.

Resources Required

There are many ways to improve the situation for perceptual dyslexics without having
to spend a penny. Such simple and cost-free actions as dimming the lights in a room, using
natural instead of fluorescent lighting, allowing students to use colored paper and to wear
caps or visors indoors, and avoiding the use of white boards will all help. (Irlen, undated;
Wilkins, 2003)

But to alleviate the problem requires intervention in the form screening and,
ultimately, selection of appropriate colored overlays or filters.

The outlay required to implement such a program would be minimal. Only basic
instructions would be required at the recruiting sites, enough training for the recruiter to be
able to administer the simple Irlen Type-1 test or the Wilkins Rate of Reading test and to
assist the applicant in choosing the appropriate overlay. At the Recruit Training Center, it is
anticipated that one or two Educational Specialists who have backgrounds in education and
have been trained in the Irlen Method will be required to administer the screening and
perform the diagnostic analysis.

Supplies of overlays or transparencies for recruiting sites and the Recruit Training
Center will also be necessary. Overlays from Irlen Institute cost approximately $1.25 each,
although less expensive transparencies are available from other commercial sources. (It must

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
101

be remembered, however, that Irlen overlays are specifically designed and developed for the
purpose of alleviating SSS.) Tinting of lenses (whether corrective or not) adds about $50 to
$100 per pair at this point in time. Under contract, however, the price will certainly drop to a
more nominal figure (Irvine, 1997).

MILITARY APPLICATION

And what will we get back for this investment? The individual service members will
benefit, of course, with improved reading speed and comprehension. Because they will
experience less visual fatigue, their attention spans will increase. As they begin to
understand and realize that they can do what they thought they couldn’t do, their self-
confidence will improve, as will their attitude to training and the job itself. Just knowing that
there is a solution available will often be enough to change an attitude and strengthen a
resolve to succeed.

The military services will also reap the rewards of this program because, in addition
to increasing the qualified pool of applicants for enlistment, the young people affected will
be able to train more efficiently. Remediation, basic, and ongoing training will be more
effective and, as a result, more efficient. With more effective training, the service member
will be more knowledgeable and efficient in the field. It can further be anticipated that there
will be fewer behavioral problems—both during and after training—primarily due to the
change in attitude that has been shown to occur following screening and diagnosis for SSS.

All in all, we believe that a higher-quality service member will be delivered to the
field or fleet, both academically and attitudinally.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
102

References

Borlik, SSG A. (1999). Services grapple with recruiting challenges. American Forces
Information Service.
Irlen, H. (1991). Reading by the colors. Garden City Park, NY: Avery Publishing
Company.
Irlen, H. (1999). Irlen Syndrome/Isotopic Sensitivity: Most Frequently Asked Questions.
Retrieved September 20, 1999, from the World Wide Web:
http://Irlen.com/sss.faq.htm
Irlen, H. (undated) Tips and Research from Helen Irlen. ACNOnline: Association for
Comprehensive Neuropathy. Retrieved October 31, 2003, from the World Wide
Web: http://www.latitudes.org/articles/irlen_tips_research.htm
Irvine, J. H. (1997). Dyslexic effect on the Navy training environment and operational
efficiency: A prognosis for improvement. (Briefing.)
Irvine, J. H. (2001). The cause of Arleen syndrome. (Briefing.)
Irvine, J. H., & Irvine, E. W. (1997). Isotopic sensitivity syndrome in a single individual (a
case study). Naval Air Warfare Center, Weapons Division, China Lake, California,
April.
Lewine, J. D., Davis, J. T., Provencal, S., Edgar, J. C., & Orrison, W. W. (in press). A
magnetoencephalographic investigation of visual information processing in Irlen’s
scotopic sensitivity syndrome. Perception.
Newman, R. M. (1998). Technology for dyslexia: Federal education & disability law
compliance.
Robinson, G. L., & Conway, R. N. (1990). Irlen filters and reading strategies: The effect of
coloured filters on reading achievement, specific reading strategies and perception of
ability. Retrieved on August 19, 1999, from the World Wide Web:
http://www.edfac.usyd.au/centres/children/Greg.html
Robinson, G. L. (1994). Coloured lenses and reading: A review of research into reading
achievement, reading strategies and causal mechanisms. Australasian Journal of
Special Education, 18(1), 3-14, citing M. C. Williams, K. Lecluyse, & A. Rock-
Facheux (1992), Effective interventions for reading disability, Journal of the
American Optometric Association, 63(6), 411-417.
Sims, P. (1999). Awakening brilliance. Retrieved August 19, 1999, from the World Wide
Web: http://www.onlineschoolyard.com
Warkentin, M., & Morren, R. (1990). A perceptual learning difference. Notes on Literacy,
1990-1994 (vol. 64, Oct. 1990).
Whiting, P. R., Robinson, G. L., & Parrott, C. F. (1990). Irlen coloured filters for reading: a
six-year follow-up. Retrieved on August 19, 1999, from the World Wide Web:
http://www.edfac.usyd.edu.au/centres/childrens/SixYr.html
Wilkins, A. J. (2003). Reading through colour. West Sussex: John Wiley & Sons Ltd.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
103

Neurofeedback Training for Two Dimensions of Attention:


Concentration and Alertness
Jonathan D. Cowan, Ph.D.
President and Chief Technical Officer
NeuroTek, LLC d/b/a Peak Achievement Training,
1103 Hollendale Way
Goshen, KY 40026
jon@peakachievement.com

Dr. Louis Csoka founded the United States Military Academy’s Center for Enhanced
Performance at West Point in 1989, when he was a Colonel and Professor in the Department of
Behavioral Science and Leadership. It has grown to be the largest performance enhancement
center in the U.S. because the Army has found it to be so valuable. Dr. Csoka recently stated “In
today’s military, learning the cognitive skills is not enough. One must also learn the optimal
sequence of concentration, alertness and relaxation for each activity. At no time has the demand
for improved attention been greater than in today’s Army with its high tech weapons and the
extensive use of advanced information technology. The demand for attention has grown
exponentially while our basic attention skills lag far behind. We still basically learn attention as
a by-product of the education and skills training we receive. To be sure, many enhancements
have been made in the Army’s training programs. But still today, none directly target the
attention network in the brain. Attention control training with the Peak Achievement Trainer
does exactly that. “

BRAINWAVE BIOFEEDBACK

The Peak Achievement Trainer uses a simpler and clearer method of brainwave
biofeedback, or neurofeedback, to measure and enhance both concentration—single pointed
focus—and alertness or arousal. Biofeedback trains people to better regulate their brains and
bodies by telling them what is happening there more understandably. Brainwave biofeedback is
also known as Neurofeedback or EEG Biofeedback. The Peak Achievement Trainer uses much
more accurate Neurofeedback to detect and improve concentration and relaxation. It is easier to
understand and faster to finish training than other neurofeedback. Using the older procedures
that he developed, Dr. Joel Lubar at the University of Tennessee (Lubar, 1994) found that
neurofeedback was 80% effective in more than 1000 children with Attention Deficit Disorder.
In his case series, the average increase in grade level on standardized tests was 2.5 years. The
typical increase in IQ test scores was 8-19 points, and the average Grade Point Average
improved 1.5 levels (C to B+). 60% of his clients decreased or eliminated medication. There
were major improvements in behavior, with decreased hyperactivity and violence (see Nash,
2001). As a result of instituting a school neurofeedback program in Yonkers, New York, the
number of students suspended at Enrico Fermi School decreased from 53 in 1996-97 to 17 in
1997-98 and 22 in 1998-99 (Sabo, M.J., personal communication).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
104

PEAK ACHIEVEMENT TRAINING PROCEDURES

The Peak Achievement Trainer detects the brainwaves from the Executive Attention
Network and converts them to a measure of single pointed focus and interest. It then converts
the measure to audio and video outputs. Students can use these outputs to learn how to control
their concentration. They can improve through practice with the Trainer.

The improved procedures used in the Peak Achievement Trainer were developed by
combining research performed by Dr. Barry Sterman on Air Force B-2 Bomber pilots with
neuroimaging studies that pinpointed the parts of the brain most associated with the executive
control of attention. In NASA and Air Force research (Sterman, DTIS Reports) that was
classified at the time—the late ‘80s--and is still largely unpublished, Dr. Sterman found that as
they focused on a particular aviation task, the alpha brainwave decreased. There was an alpha
burst as focus ended, and then suppression as the next task began. We call this the “microbreak”.
The more difficult the task, the greater the alpha suppression. The better pilots in a B2 instructor
selection process suppressed parietal alpha more completely. The better pilots needed a shorter
alpha period or “microbreak” before starting to focus again. From this data, Dr.Sterman
developed measurements of the EEGs of these pilots, which was a very powerful differentiator
between the 6 Top Guns who became instructors and the other 12 pilots. The Air Force used a
regression line with hundreds of variables to make the determination. None of the variables
correlated above 0.4, except for his EEG measures, which were above 0.6. He selected the same
pilots with just his metrics that the Air Force did with all of theirs.

From Sterman’s studies, we developed the idea that the healthy individual cycles
between concentration and relaxation by focusing on (part of) a task until it is done, and then
taking a brief rest. Even the best trained brain cannot concentrate forever. It’s not designed to.
Consistent, intense concentration has its costs in stress, emotional tension, and disease. The Peak
Achievement Trainer teaches relaxation through breathing instructions on an audiotape and then
integrates it into training for the cycle.

More recently, we developed a second measurement that is quite different—a way to


determine the degree of Alertness or Arousal of the Central Nervous System. By Concentration,
we mean the degree of single-pointed focus on a perception, thought, or image, like a camera,
zooming in. You can be relaxed, very alert, or in-between and still have single pointed focus.
Many people see a parallel between this intense focus and the popular term for the state of an
athlete at his peak—“the Zone.” In fact, The New York Times Magazine did an article about
the Peak Achievement Trainer in a special issue called Tech 2010: A Catalog of the Near
Future, focusing on technology that will change all of our lives in this decade. They called the
article “The Coach Who Will Put You in the Zone.”

Increasing Alertness/Arousal creates more intense stimulation or excitement. It enhances


emotion. High Alertness is also associated with summoning resources to respond to a
challenging situation. It is related to stimulation of the Reticular Activating System by many
studies. It has a high cost in energy, and there is a quick “burnout” if the energy is not

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
105

conserved. We can determine both Concentration and Alertness from the same brainwave at the
same location. We call this the ConAlert protocol. The Trainer’s Concentration measure
decreases as you focus more, while the Alertness measure increases with greater arousal and
effort.
One of our trainers is also a specialist in teaching a particular reading technique, the
Scholar’s Edge. In Figure 1, he is applying this to a book that he has never read. First, he
examines the Table of Contents in a particular way. From about 5 to 15 seconds, numbered at
the bottom, this tracing shows the second part of his effort, in which he reviews the schemata he
has constructed. He is shifting his attention from a narrow focus on particular entries in the
Table of Contents to a broader view of the synopsis of the book, back and forth. The
Concentration line reflects this, but he is not very intensely alert. From 15 to 18 seconds, he
takes a “microbreak”, resting and giving his brain time to recharge. He then reviews two points
of interest from 18 to 24 seconds. Both of them are absorbing, as can be seen by the dips in the
concentration line. The Alertness line shows that one is perhaps slightly arousing. Then, from
24 to 27 seconds, he asks himself to set an intent for this reading session. This requires both
high Alertness and Concentration. This is even more evident for the next five seconds, when he
sets his strategy for reading the book.

There are many activities, military and otherwise, that can be analyzed in this fashion.
With study and statistics, expert performance patterns can be differentiated, traced out, and
described in detail by reviewing the record. Audio and video can be recorded; our software will
soon provide for synchronized recording and playback. Additional measures such as the
direction of eye gaze can be added to the analysis. These very sophisticated measures may be
particularly helpful in designing new user interfaces and making training more interesting. Once
the problem areas can be delineated, appropriate corrective steps can be taken.

PEAK ACHIEVEMENT TRAINING METHODS

Our DARPA Project Officer suggested the idea that the Peak Achievement Trainer
develops the “Cognitive Athlete”. We believe there are at least 8 different ways that they can be
trained:
• Strengthening the ability of the Executive Attention Network to focus attention.
• Strengthening the ability of the midbrain to intensify alertness/arousal.
• Focusing attention on parts of the body that the coach wishes to work with.
• Train the user to take brief, relaxing microbreaks which recharge the brain.
• Find the best possible degree of alertness/arousal to perform particular activities
optimally.
• Perform arbitrary sequences of concentration, alertness, and microbreaks.
• Discover and enhance performance of the sequences that are optimal for particular
activities.
• Perform these sequences despite distractions such as self-talk and crowd noise.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
106

THE HILLENBRAND PEAK PERFORMANCE PROGRAM

The Peak Achievement Trainer is an integral part of the Attention Control Training
program at the Center for Enhanced Performance. Dr. Csoka purchased his first Peak
Achievement Trainer at the recommendation of Dr. Nate Zinsser, the psychologist who currently
runs the Attention Control Training program. Dr. Csoka has recently created a very similar Peak
Performance Center at the executive offices of Hillenbrand Industries, a Fortune 1000 health care
company. He has created enough value there that all 22 of their top executives have completed
the 20 session training session, doing about one session a week. Response to the program was
overwhelmingly positive. These executives have returned voluntarily for many additional
sessions, primarily using the Peak Achievement Trainer’s ConAlert protocol, and the program is
being expanded to 39 more executives. They have ordered Trainers for their top executives. Dr.
Csoka states “Training attention control begins the entire process of developing peak
performance, in the military and elsewhere. The Peak Achievement Trainer provides an
excellent tool for developing enhanced skills in focus and concentration. We have been working
with business executives at a Fortune 1000 company on their attention skills as part of a broader
peak performance training program that is an integral component of their leadership development
program. By providing business situations and scenarios as input for the Peak Achievement
Trainer, we have been able to demonstrate measurable improvements in their ability to attend in
crucial meetings, engage in critical performance appraisals with employees, and deliver
exceptional presentations. The Peak Achievement Trainer provides the critical feedback during
attention control training needed to develop this skill to a level equal to what elite athletes have
been able to achieve.”

The change in the ability to concentrate of the group of Hillenbrand executives after five
sessions of Peak Achievement Training is shown in Figure 2. The two cones on the left show
how long they could hold the concentration line below 30 during the pre-test, while the last two
cones are the post-test data. They were given four trials during each test. The first trial is shown
on the left, and the best trial is shown on the right. During the pre-test, the first trial averaged 19
seconds, ranging from 10 to 40. The first trial in the post test was more than twice as long, 44
seconds, with the range running from 25 to 65 seconds. The best trial average almost doubled
from 65 seconds at the pre-test to 128 seconds at post test. The ranges were 18 to 180 and 48 to
220, respectively.

THE MISSING COMPONENT OF MILITARY TRAINING

Dr. Csoka and I believe that there is a missing component of military training. Learning
the optimal sequences of concentration, alertness, and microbreaks is as integral to skill
development as having the correct cognitive information at the right time. A high quality
training experience should produce sequences of focus and alertness that are very similar to those
in the real battle.

Dr. Csoka further states “The applications for direct attention training in the Army are
almost endless. Both the individual soldier and the crew weapons systems involve highly

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
107

sophisticated equipment and technology requiring much higher levels of attention and
concentration. Take for example the tank crew. Target identification, acquisition and locking
while on the move and under fire require very fine-tuned concentration on the part of the crew.
West Point’s Center for Enhanced Performance staff conducted a study at the National Training
Center with Army M1 tank crews undergoing tank gunnery. The training group was given
performance enhancement training involving relaxation, imagery and attention training. They
significantly outperformed a control group on all the tank gunnery measures. This was without
the availability of the Peak Achievement Trainer. Had this attention technology been available,
the results would have been even more compelling and the training time could have been
reduced.”

POTENTIAL MILITARY USES

Our initial DARPA project produced several suggestions for military use of this new
technology. We proposed an advanced system designed to include a mental exerciser—a
separate program for taking the brain to the gym and enhancing the capacity for focusing and
maintaining alertness. The military spends billions of dollars each year to train physical fitness,
but very little to directly train mental fitness. Many recruits have problems paying attention.
Considering the importance of mental fitness for many of the decision making tasks in today’s
high-tech military, it will be very useful to provide tools that can be used like physical
conditioning equipment, to create a state of mental toughness via repetition of demanding mental
tasks. Training the “cognitive athlete” to enhance their capacity for focus and alertness also
increases the envelope within which task demands can be structured and reasonably met, thus
permitting increased efficiency of the use of human resources within the Armed Forces. We
believe there are specific ways to use this technology to enhance memory and decrease fatigue.
It can also be developed into a metric for monitoring workloads during a particular task. It can
also help the military to design better Computer-Based Training by measuring the interest of the
student from moment to moment.

The original DARPA project description—the Student State Project—focused on


developing a way to monitor the state of a student being given Computer-Based Training, so that
the computer could improve the tutoring it administered to the student. There are a variety of
tutoring systems for attention control training that can be created. One suggested approach
would include a semi-transparent overlay which can be placed in the lower right corner of the
screen. This display can show both instantaneous and time-averaged Concentration and
Alertness, and/or a plot of the last minute. Audio feedback employing particular sounds can
steer the user back to the optimal state. Alternatively, visual warning signals can be provided as
flashing text or interruptions. Later, a dynamically generated review of the sequence and events
could be presented, along with coaching generated by the computer or a coach.

There is an enormous potential for developing Concentration and Alertness databases that
can assist in training the optimal sequences for each task. Measurements of experts at the
particular task could be used to provide a library of optimal sequences for comparison.
Collecting information on those who perform similar tasks would produce a group of norms that

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
108

will allow the system to flag unusually poor performances and forward the information for
appropriate remedial action. Comparison of present performances with the data from past ones
by the same individual would provide useful information for teaching and evaluation. Our patent
#5,983,129 covers analyzing these brainwave signals in association with an event and also using
them to modify the subsequent presentation by a computer or simulator. The possibility of
integrating this approach with measurements of eye fixation could produce even more refined
measurements and descriptions of the sequences.

Due to a change in priorities associated with the development of the DARWARS project
and a funding crunch at DARPA, we are currently looking for additional funding to continue
research and development. Our collaborators have developed a new system small enough to fit
completely on a golf visor. It will soon be wirelessly connected to the PC. We have planned a
very sophisticated series of further validation studies for our ConAlert measures and a new, more
powerful measure still under development.

REFERENCES

Lubar, J.F. (1994): Neurofeedback for the management of attention deficit-hyperactivity


disorders. In M.S. Schwartz & Associates (Eds.), Biofeedback (2nd. ed.). New York, Guilford
Press, pp. 493-525.

Nash, J.K. (2000): Treatment of Attention Deficit Hyperactivity Disorder with Neurotherapy.
Clinical Electroencephalography 31(1): 30-37.

Sterman, M. B. et al: Defense Technical Information Center Reports ADA151901,


ADA171093, ADA186351, ADP006101, ADA142919.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
109

Figure 1: Changes in Concentration and Alertness during “Scholar’s Edge” reading program.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
110

140
128
120

100

80 1st Trial
65
60 Best Trial
44
40
19
20

Figure 2: Length of consistent concentration before and after the Hillenbrand Peak Performance
program for the first and best trial.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
111

WHAT TODAY’S SOLDIERS TELL US


ABOUT TRAINING FOR THE FUTURE
Brooke B. Schaab and J. Douglas Dressel
U.S. Army Research Institute for the Behavioral and Social Sciences
5001 Eisenhower Ave.
Alexandria, VA 22333
Dressel@ari.army.mil

How do we train mid- and junior-level Soldiers in the information technology (IT) skills
needed for operational units? How can we maximize the acquisition, transfer, adaptability, and
retention of these skills necessary for transformation to the future force? To help answer such
questions, scientists from the U.S. Army Research Institute (ARI) administered questionnaires
and conducted interviews with operators of Army Battlefield Command Systems (ABCS). Sixty-
two enlisted Soldiers from three Army posts participated.

OBJECTIVES

Soldiers who are currently using Army digital systems gave their perspective on current
and future training needs for the “digital Soldier.” Findings provided insight into the type of
training the Soldiers view as most productive in developing the expertise needed to take full
advantage of new technologies. This paper summarizes Soldier perspectives about:

• How will current training practices prepare Soldiers to successfully perform in units
supported by digitization;
• Learning preferences for new technologies, noting opportunities presently available to
capitalize on training;
• How digital systems change the jobs or the tasks that Soldiers perform.

METHOD

Researchers met with groups of four to eight Soldiers to gather information on current
digital training practices on multiple systems. First, Soldiers were administered a questionnaire
requesting information on the Soldiers’ training background, training environment, training
preferences, computer experience, and digital team performance.

Next, each Soldier was given the first of a series of four questions concerning digital
training. The Soldiers were given 10 minutes to write their responses to these questions. Soldiers
then passed their question sheets counter-clockwise to the next Soldier who would answer this
new question. A Soldier could expand upon the previous answer or give a different response;
instructions were to write whatever seemed appropriate. This rotation of questions and additional
responses was continued until each Soldier answered each of the four questions.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
112

Following this, a similar rotation approach was taken where each Soldier selected and
ranked what he or she saw as the best two responses to each question. This resulted in each of the
four questions having four sets of responses and four sets of rankings.

Finally, researchers conducted an interview with each group of Soldiers, which generally
took 45-60 minutes. The interviews were audio-recorded for later review. Soldiers were asked to
speak freely, and to give their full and complete impressions of digital training practices that
would be valuable to the Army. Comments were not for individual attribution. Tape recording of
the session did not seem to constrain or inhibit the Soldiers.

FINDINGS/IMPLICATIONS

“Our biggest problem is that we need more training.”

Current training on ABCS systems focuses on how to operate the system and is context-
free. Soldiers complete this training at a novice level with knowledge of facts, features, and rules
that they can verbalize and apply by rote. They have difficulty applying their knowledge in new
contexts. To move beyond this novice level, Soldiers want and need practical experience in
multiple situations to form a more sophisticated understanding of system uses. For example, they
need to learn to prioritize and organize information to achieve a variety of goals. Lengthening the
time Soldiers spend on their initial IT training is not the answer. When it comes to the most
effective training beyond basic “knobology,” field exercises are preferred by far, followed by
exploring the system on their own.

The most pervasive and consistent finding was that junior-level enlisted Soldiers needed
and wanted additional training to become proficient at their jobs.

Army digital systems are a never-ending work in progress: build one, try it out, make
modifications, and build it again. Soldiers must continuously update their knowledge and adapt
to new, changed, or absent functionality. More important, they must understand how these
changes influence their ability to do their Army job. This type of training goes beyond the
content in New Equipment Training (NET), which focuses on how to operate the system.

“A lot of hands on, that’s important for today’s up and coming Army.”

Soldiers said what kind of training they found successful. “Give us hands-on training,
using a full job flow sequence.” Here, they received the inputs they would normally receive in an
actual mission and produced and transmitted the required outputs. Soldiers said that field
experiences and interacting with their peers were the “best” ways to learn the systems. Field
exercises should include setting up and connecting the digital systems as well as operating the
systems in various situations. Soldiers complained that without connectivity, an all too common
occurrence, training does not happen. In short, Soldiers say that there were two problems. One
associated with training to use the system. The second associated with training to set up and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
113

troubleshoot interconnectivity problems. Some digital system training addressed both problems,
while other training addressed only system use problems.

Learning Preferences and Computer Experience

Soldier responses show they want to learn Army digital systems the same way they
acquire much of their non-military digital expertise: explore the software and equipment to solve
real problems. (see Figure 1).

70
60
50
Percent

40
30
20
10
0
Read the Watch Take a Explore Have
m anual som eone course the som eone
use it program help m e

Figure 1. Preferred method of learning new software

Soldiers were queried about their familiarity with using technology aids to supplement or
support training. Most Soldiers reported a great deal of experience with the Internet and instant
messaging, but they had limited experience with distance learning (DL), web-based gaming, or
hardware installation (see Figure 2). This suggests that training delivered via distance learning or
web-based gaming might require added support to implement, at least until Soldiers become
familiar with these techniques.

Opportunities for Training

Soldiers indicated that they do have the training time and resources available to take
advantage of DL opportunities.

¾ Ninety-two percent (92%) had their ABCS digital systems available in their unit for
training.
¾ Fifty-eight percent (58%) had time to train during their work hours if training
resources were available (e.g., CD ROM, manuals, on-line help, practice
vignettes/scenarios).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
114

¾ Seventy-four percent (74%) would train on their own time if computer systems and
training resources were available.

70

60

50
Percent

40

30

20

10

0
Internet Instant Netw orks Web-based Installing Distance
m essaging gam ing hardw are learning

A lot Som e A little None

Figure 2. Computer experience reported by Soldiers

“Technology Changes the Way We Fight”

Troops at all levels are beginning to understand how information technology changes
their duties and responsibilities.

• “The system is a good thing because you can give and receive messages instead of
walking far across the firing point or using a radio because the enemy may intercept
the channels.”
• “With the FBCB2, I can send messages to the commander when I am lost or in a
dangerous situation. I can tell what’s going on where I am, or set up an attack, or plan
where to go next.”
• “It is my belief that field training is the best training that an analyst can benefit from.
It is valuable because it gave me an understanding of what the other ABCS
components provided me within the Army.”

One commander enthusiastically recounted a recent field exercise where Soldiers left
from dispersed points to converge at a common location at the designated time. Soldiers did not
talk with each other, but used their digital system to track themselves and their allies. “This
would have been impossible without our digital systems,” reported the commander.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
115

What advantages do Soldiers rate as applying to a moderate or to quite an extent to their


digital systems:

¾ Digital systems make it much safer for troop movement in enemy territory.
¾ Once we understood the limitations and capabilities of the digital systems, we were
able to use them in new and better ways.
¾ Planning and preparation is much faster when we can collaborate using our digital
systems.

Summary

Soldiers expressed a desire and need for more training using digital technology. They
wanted training to be hands on and scenario based. In short, there were three major findings from
the research:

• Soldiers want more training to integrate their knowledge of their digital system with
their Army job.
• Soldiers see opportunities available now for additional training at their home station.
Although unfamiliar with distributed learning methods, they express a willingness to
use technology to advance their training.
• Soldiers recognize the value of technology to augment their military capacities.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
116

OBJECTIVE-BASED TRAINING: METHODS, TOOLS, AND


TECHNOLOGIES
Dr. John J. Burns, Janice Giebenrath, and Paul Hession
Sonalysts, Inc.
12501 Research Parkway
Orlando, FL 32826
burns@sonalysts.com

INTRODUCTION

The United States Surface Navy has adopted a “train by objectives” approach as an essential
element in achieving and maintaining readiness. This approach is backed by years of R&D in
individual and team training. However, in order for this approach to be fully embraced at the
deck plate level, methods and tools that are flexible and able to be tailored are needed. In
addition, the software and hardware associated with these methods and tools must be accessible
and affordable.

The Coalition Readiness Management System (CReaMS) effort seeks to provide methods, tools,
and technology to support objective-based training for warfighters within a coalition
environment. In particular, the CReaMS effort has leveraged the Navy’s investment in the Battle
Force Tactical Training system and associated efforts. In this paper we first present an overview
of objective-based training. Then we turn to a description of the first two phases of the CReaMS
effort with an emphasis on our effort in Phase II. We conclude with a short discussion of
ongoing CReaMS Phase III efforts.

OBJECTIVE-BASED TRAINING

The US Navy has adopted an objective-based approach to training as a result of analysis into the
effectiveness of existing training approaches. This move was predicated on the fact that systems
such as the Battle Force Tactical Training (BFTT) system provided extremely capable shipboard
training systems. BFTT can in fact provide significant training opportunities for Sailors onboard
ship, using their own tactical systems – training the way they will fight. However, it was
recognized that in addition to the hardware and software manifest in systems such as BFTT,
there was also a requirement for methods and tools to support Sailors in using the effective use
of these embedded training systems (Acton and Stevens, 2001).

What was needed was a way to exploit this capability and provide form and structure to the
Navy’s afloat training processes. The approach chosen would ultimately fulfill the Center for
Naval Analysis’ recommendation that “. . . an appropriate course is for the Navy training
establishment to focus on ensuring the minimum acceptable training requirements are both
defined and executed.” The challenge was to provide a process that would:

• Assist shipboard training teams;


• Help quantify training readiness;
• Attain a synergy with existing training systems;
• Be sufficiently robust to support both existing and emerging training systems.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
117

This new training process viewed the ship as a system that was capable of a finite number of
tasks based on its systems. Each task was used to generate a list of subordinate actions or steps
required to ensure the success of the higher-level task. These higher-level tasks could have a
team focus, or they could be an aggregation of individual watch stander tasks or a combination
of both. In all cases, team and individual tasks were defined by measurable standards of
performance. These “standards” were criterion-based measures tied to absolute standards, using
specific objective behavioral items. This follows the generally accepted premise that “The best
information for evaluation of training can be acquired by using objective, criterion-referenced
measures.”

The Joint Mission Essential Task List (JMETLS), Navy Mission Essential Task List (NMETLS)
and the Required Operational Capability/Projected Operational Environment (ROC/POE)
documents were integral to this effort. The tasks, sub-tasks and measures generated had to be
realistic, and they had to link to and support, higher-level requirements.

Training Objective Construction

Training objectives were constructed using the same basic model shore-based school houses and
academia have used for years; a hierarchy of terminal and enabling objectives supported by
measures of performance and effectiveness. As mentioned earlier, each MOE/MOP contained a
standard – a data point observable and quantifiable in terms of time, distance and/or quality.
This approach was critical to ensure we removed or lessened trainer subjectivity.

The final structure included the terminal objective, the “what” would be accomplished, the
enabling objective, the “how” it would be accomplished, and the measure of performance or
standard, the “how well” it would be accomplished. This terminology is derived from a task
based training paradigm. In a Personnel Performance Profile (PPP) based system, the terminal
objective would be a level-one objective; the enabling objective would be a level two objective.

OBJECTIVE-BASED TRAINING IN A COALITION ENVIRONMENT: CREAMS

The Coalition Readiness Management System (CReaMS) is a congressionally mandated effort


that evolved to address operational requirements with increased emphasis on combined or
coalition operations. The US Congress funded the CReaMS project to encourage international
collaboration in exploring areas of cooperation in training and readiness (Clark, Ryan, O’Neal,
Brewer, Beasley, & Zalcman, 2001). CReaMS builds upon several past Office of Naval
Research (ONR) sponsored research and development efforts in the areas of team performance
and decision-making, performance measurement, objective-based team training and various data
collection tools (Cannon-Bowers & Salas, 1998; Brewer, Baldwin-King, Beasley, & O’Neal,
2001). While the CReaMS effort involves a plethora of government and industry participants
whose expertise ranges from network topologies, to military training, to weapons and combat
systems, early on a Learning Methodology (LM) team was stood up and tasked with infusing the
effort with theory and methods that would lead to enhanced warfighter performance. This paper
focuses on the work of the LM team.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
118

CReaMS Phase I

CReaMS Phase I was a collaborative effort between the United States Navy (USN) and the
Royal Australian Navy (RAN) in adapting, implementing, and evaluating the BFTT learning
model to a coalition training environment. The BFTT learning model is presented below in
Figure 1.

Figure 3: BFTT conceptual learning model.

The Phase I effort uncovered challenges for coalition training and measurement. The Afloat
Training, Exercise, and Management System (ATEAMS) was used to apply and manage the
team learning methodology during these exercises. ATEAMS contains an extensive database of
training objectives for each ship in the Afloat Training Group. Based on the chosen training
conditions, the user selects specific training objectives to be associated with various events
throughout the training session. This carries with it the measures of performance associated with
each objective (Hession, Burns, & Boulrice, 2000).

While this training management system has been successful, it presented challenges and
limitations when applied to the CReaMS effort (i.e., management of such a large database has
been shown to be cumbersome). In addition, the focus of ATEAMS is on intra-ship training,
while the CReaMS effort involves inter-ship training. The results of the Phase I effort showed
the ATEAMS tool was not the best choice for CReaMS performance measurement. Lessons
learned were applied to the Phase II effort.

CReaMS Phase II

Using lessons learned from Phase I, CReaMS Phase II involved the development of a new
approach to coalition performance measurement, from developing training objectives to data

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
119

collection and analysis. Also included was the development of novel approaches for organizing
and representing data.

Following the objective-based training model, training objectives to be measured during the
exercises were identified and developed. In keeping with objective and mission-based
development, the focus remained on objective-based training oriented up to the ship level for the
USN and utilized NMETL-derived measures. Building upon this effort, the RAN objectives
were then aligned with the USN approach for performance measurement. Training objectives
were developed for both procedural and process measurement. Procedural measurement
involves the assessment of individual and team completion of training objectives usually within a
prescribed sequence. Process measurement examines team-level skills necessary for effective
performance (e.g., effective communication).

The development of this new approach for collecting performance data also involved front-end
data organization and analysis software, hand-held software for data collection, and methods for
transferring data between each tool. The front-end data management tool, stored on a PC,
housed the entire population of event scenarios, training objectives, and associated performance
measures. Here, a data collection is be built by linking desired training objectives with event
scenarios. This data collection plan then transfers to the hand-held device for implementation.
Once data has been collected, it is then transferred back into the data management software for
analysis. Output includes summary data regarding performance standards met for the training
objectives (please see Giebenrath, Burns, Hockensmith, Hession, Brewer, and McDonald, 2003
for a detailed description of the methods and tools developed and implemented during the
CReaMS Phase II effort).

Putting it all Together: CReaMS Phase III

In Phase II of the CReaMS Phase effort, a new and novel approach to the development and
implementation of objective-based performance was successfully employed in a coalition
training environment. The use of task sequences provided an effective method for organizing,
storing, and transferring procedurally oriented performance data and while actual data collection
tools differed across sites (laptop computers with Excel-based tools were used in Australia while
the HHD was used for data collection with the USN participants), there was consensus among all
CReaMS participants that the methodology worked well both in terms being “user-friendly” in
data collection and in terms of providing performance data at the right level of detail for exercise
participants.

Building on lessons learned and the successes of Phases I and II, in CReaMS Phase III, a specific
goal was set to conduct a distributed, participative, and facilitated debrief. Once again the
primary participants in the “Virtual Coalition Readiness” (VCR) exercise were the USN
(including the USS HOWARD (DDG-83), and Tactical Training Group Pacific) and the RAN
(including RAN Ship Trainers at HMAS WATSON with watch standers from the ADELAIDE
and the ARUNTA). The VCR exercise was designed to incorporate all of the elements of
CReaMS Phase I and Phase II events with an additional emphasis on the integration of
procedural and process measurement and feedback. While a lengthy description of the CReaMS
Phase III effort is beyond the scope of the current paper, a discussion of key constructs and how

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
120

they were implemented will provide the reader with a big-picture understanding of this latest
phase.

Integrating Process and Procedural Measurement

In CReaMS Phase II the LM team brought together procedural and process measurement through
the integration of the previously discussed task sequences and the Team Dimensional Training
(TDT) methodology (Smith-Jentsch, Zeisig, Acton, and McPherson, 1998). This effort
addressed questions such as, “What to measure?”, “What level of detail to measure?”, and “What
resources are required?” It was determined that for coalition training, the focus of performance
measurement needed to be on the process and procedure within and between warfare areas (intra-
ship level), between ships and between warfare areas (inter-ship level), and within the task unit
level and between the task unit commander level and the ships in company.

The TDT process measurement methodology specifies four super-ordinate dimensions against
which team performance is to be evaluated: Information Exchange; Communication; Supporting
Behavior; and Initiative/Leadership. Each of these dimensions is further delineated by a set of
sub-items (e.g., phraseology, brevity, clarity, and completeness for Communication) that provide
behavioral anchors for raters to use in assessing performance. To this, the CReaMS effort added
a fifth dimension—Critical Thinking—defined by the sub-items: maintaining awareness of tasks
in progress; effective management of integrated tasks; appropriate allocation of tasks among
team members; recognizing unexpected/emergency situations; and appropriately implementing
pre-planned responses. In CReaMS Phase III, The LM team worked with the VCR scenario
development team to create events within the scenario that would stress both specific team tasks
(to be measured by procedural measurement scheme) and team process (as measured by TDT)

Implementing a Participative and Facilitated Debrief

Previous research has demonstrated that debriefing skills and engaging all members of a team are
factors critical to realizing the benefits of objective-based training (Tannenbaum, Smith-Jentsch,
and Behson, 1998, Smith-Jentsch, Zesig, Acton, and McPhereson, 1998). Thus, in CReaMS
Phase III, Video Tele Conferencing (VTC) hook-ups were established at all participant sites and
the LM team developed a web-based debriefing tool that integrated performance data (process
and procedural) collected from all participant sites. Each day across the 4-day exercise, data
collectors at each site collected targeted (by event) performance data, met with their local data
collection teams to integrate assessment data, and then provided inputs to LM team members at
HMAS WATSON. Using the Distributed Debrief Product tool, LM team members at HMAS
WATSON integrated procedural and process data across multiple levels of performance and
within an hour of the end of the exercise, a comprehensive debrief product was available for
review by local facilitators. After their review, the facilitators then engaged the VCR
participants in a Task Unit debrief with the Distributed Debrief Product providing structure for
facilitators to provide context, specific performance feedback, replay (from BFTT Debrief
Product sets), and, most importantly, input from exercise participants. Figure 2 provides a
representative sample screen from the debrief tool—a typical debrief included hundreds of such
screen allowing for multiple levels of debrief across multiple warfare areas.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
121

Figure 4: Sample Distributed Debrief Product screen shot.

CONCLUSION

As events in the new millennium underscore, global forces are coming together that demand
formation of effective Nation-to-Nation coalition training capability. The CReaMS project
provides a case study in how coalition partners can work together leveraging technologies,
methods, and tools that individual coalition partners bring to the endeavor in order to create a
training environment in which the whole is greater than the sum of the parts.

In particular, the CReaMS project has seen the successful adaptation of the BFTT system within
a virtual training environment. In addition, multiple Navies worked together to develop methods
for data collection, data reduction, and debrief that enabled successful coalition training in a
virtual environment. While quantitative analyses that are now in the planning stage will allow
for an empirical evaluation of the value of the CReaMS effort, the value to the warfighter is best
evidenced in their words. As the Sea Combat Commander, CAPT. Bill Hoker, USN,
COMDESRON SEVEN, debriefed his multi-national warfare commanders in a facilitated after
action review in 1 of the 8 intense Strike Force synthetic exercises, he stated, “I am amazed at
the level of intensity, interoperability and training value exhibited between the US Navy and the
Royal Australian Navy during this coalition event.”

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
122

REFERENCES

Acton, B., & Stevens, B.J. (2001). Objective-Based Training and the Battle Force Tactical
Training System: Focusing our Fleet Training Processes. 23rd Interservice/Industry, Training,
Simulation, and Education Conference Proceedings.

Brewer, J., Baldwin-King, V., Beasley, D., & O’Neal, M. (2001). Team Learning Model: A
Critical Enabler for Development of Effective and Efficient Learning Environments. 23rd
Interservice/Industry, Training, Simulation, and Education Conference Proceedings.

Cannon-Bowers, J.A., & Salas, E. (1998). Making Decisions Under Stress: Implications for
Individual and Team Training, Washington D.C.: American Psychological Association.

Clark, P., Ryan, P., O’Neal, M., Brewer, J., Beasley, D., & Zalcman, L. (2001). Building
Towards Coalition Warfighter Training. 23rd Interservice/Industry, Training, Simulation, and
Education Conference Proceedings.

Giebenrath J., Burns, J., Hockensmith, T., Hession P., Brewer J., & McDonald, D. (2003).
Extending the Team Learning Methodology to Coalition Training. Paper accepted for
presentation at the 25th Annual Interservice/Industry, Training, Simulation and Education
Conference.

Smith-Jentsch, K.A., Zeisig, R.L., Acton, B., & McPherson, J.A. (1998). Team dimensional
training: A strategy for guided team self-correction. In J.A. Cannon-Bowers & E. Salas
(Eds.), Making decisions under stress: Implications for individual and team training (pp. 271-
297). Washington D.C.: American Psychological Association.

Tannenbaum, S. I., Smith-Jentsch, K.A., & Behson, S. J. (1998). Training team leaders to
facilitate team learning and performance. In J.A. Cannon-Bowers & E. Salas (Eds.), Making
decisions under stress: Implications for individual and team training (pp. 247-270).
Washington D.C.: American Psychological Association.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
123

RACE AND GENDER AS FACTORS IN FLIGHT TRAINING SUCCESS


Dr. Wade R. Helm and Jonathan D. Reid
Embry-Riddle Aeronautical University
P.O. Box 33360
NAS Pensacola, Fl. 32508
Pensacola.center@erau.edu

ABSTRACT

Success in flight training in terms of attrition and performance scores have differed
between male caucasians and non-male caucasians both before and after implementation of
affirmative action programs. It has been suggested that institution or instructor bias may account
for the lower success rates of non-male caucasians. Two studies were conducted to examine
various prediction variables related to flight training success. It was hypothesized that both
minority status and gender would be significant variables in a multi-regression equation
predicting flight-training success. Results indicate that for both pilot and Naval Flight Officer
candidates minority status and gender were significant predictor variables. However, when
selection test scores were normalized to the beginning of flight training and then compared to
normalized completing scores, all groups but one were non-significant. Only female pilots and
Naval Flight Officers had lower normalized prediction scores than normalized completed flight
training scores. Basically the Aviation Selection Test Battery under predicts female success in
flight training. All other groups when adjusted for Aviation Selection Test Battery prediction test
scores, performed as predicted.

INTRODUCTION

Roughly 12,000 individuals annually contact ascension sources with an interest in Naval
aviation. Through an initial screening, the number is reduced to 10,000 who take the Aviation
Selection Test Battery (ASTB), a series of six tests used to select future Naval aviators (What,
n.d.). Those scores are combined with flight physicals, physical fitness scores, and officer ship
ratings to select aviation candidates. All Naval Aviator and Naval Flight Officer (NFO)
candidates then attend Aviation Pre-Indoctrination (API) at Naval Air Station Pensacola. After
taking courses in weather, engines, aerodynamics, navigation, and flight rules and regulations,
they head to their respective training wings to start primary training. Pilot students attend
primary training at Training Wing FIVE at Whiting Field in Milton, FL and Training Wing
FOUR at Corpus Christi NAS, TX. NFO students remain at Pensacola to start Joint
Undergraduate Navigator Training with Training Wing SIX (CTW-6).
CTW-6 conducts primary, intermediate, and advanced training for NFOs. Primary
training lasts 15 weeks and is conducted using the T-34C Mentor aircraft by Training Squadrons
FOUR (VT-4) and TEN (VT-10). Primary Undergraduate Student Naval Flight
Officer/Navigator Training is designed to provide officers in U.S. and international services the
skills and knowledge required to safely aviate, navigate, communicate, and manage aircraft
systems and aircraft in visual and instrument conditions (CNATRA Instruction 1542.54L, 2002).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
124

After primary training, some students can select panel navigation training conducted at Randolph
AFB, TX for the P-3C Orion or E-6 Mercury. The remaining students continue in VT-4 and VT-
10 to start intermediate training (CNATRA Instruction 1542.54L, 2002). After intermediate
training, students receive assignments to the Airborne Early Warning pipeline (training at
Norfolk, VA) or Strike and Strike-Fighter pipeline (training at Training Squadron 86 in
Pensacola) (CNATRA Instruction 1542.131B, 2001). After students finish the different training
pipelines, they receive their NFO wings. It is a long process that not everyone successfully
completes.
It is extremely expensive to put an aviator through the years of training required to
prepare him or her for their operational aircraft. If students cannot complete the program, the
service cannot recover the sunk costs that vary from $500,000 to $1,000,000 depending on which
stage of training the student failed to complete (Fenrick, 2002). Historically, the attrition rate is
20-30% of students with a majority of the losses occurring during the API and primary phases.
The Navy, therefore, has spent much time and money developing the ASTB as an economical
tool to predict a candidate’s performance and likelihood of attrition. If the ASTB predicts
positive performance, the student should theoretically succeed regardless of the student’s race or
gender. Past research, though, has detected a difference in performance based on an individual’s
race or gender.

STATEMENT OF THE PROBLEM

Since the ASTB has gone through an extensive process to remove racial, ethnic, and
gender bias, it would follow that the performance of aviator candidates would not vary among
racial, ethnic, or gender groups. It has been observed that a large proportion of minority students
fail to complete primary nor do many finish at the top of his or her class. These two studies
explored the perceived difference in minority and gender performance versus male-caucasians at
aviator primary flight training.

STATEMENT OF HYPOTHESIS

The research hypothesis states that minority and female aviator candidates achieve
different primary NSS scores versus male-caucasian candidates. The null hypothesis states that
there will be no significant difference in NSS scores determined through multiple regression
analysis at the 95% confidence level.

METHODOLOGY

The names and social security numbers of the subjects who completed API were obtained
from the Naval Schools Command (NASC) database. ASTB scores and race codes were obtained
from CTW-6 TMS 2 database. This database will also be used to obtain the subjects’ primary
training phase academic, flight, and simulator grades represented by their grade point averages.
A regression equation was computed using ASTB academic and flight scores. These grades were
converted into a NSS and combined into an overall primary NSS. The subjects were then divided
into majority and minority groups. This classification was added as a variable in a multiple-

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
125

regression equation. The process was repeated using minority status and ASTB academic and
flight scores as predictor variables and primary NSS as the criterion variable.
Race information on each student came from questionnaires completed when the
candidates begin the aviation pipeline using selected codes. Based upon student answers, a three
digit alphanumeric code containing sex, race, and ethnic classifications was entered in the
database. Candidates for this study were divided according to their race codes: white or minority.
The performance databases contain grade point information on each subject. The grade
point represents a student’s performance in flight or simulator events. An academic average is
also stored in the system. This raw information is used to compute a Naval Standardized Score.
A NSS of 50 is assigned to the mean, and a score above or below 50 correspondingly describes
performance above or below the mean for a particular area, such as academic, simulator, or flight
grades. Grades are awarded according to criteria specified in CNATRA Instruction 1500.4F.
The data was compiled in a Microsoft Excel spreadsheet. Males were assigned a value of
0 and females a value of 1. Students identified as non-minority were given a 0. All minorities
received a 1. The population’s academic, flight, and simulator means were then computed. The
means and standard deviations were used to compute academic, flight, and simulator NSS using
the following formula: NSS = ((x-mean)/std dev) * 10 + 50. An overall NSS was finally
computed by weighing the individual NSS. Using the overall NSS as the criterion variable, the
study first determined how well ASTB academic scores predict primary performance using
multiple regression using Microsoft Excel. AQR scores were entered as the first variable, and
FAR scores were the second variable. The process was repeated adding minority status as a third
variable to determine if minority status predicts beyond ASTB scores. Finally, multiple
regression was done by adding sex. For this study, minority status was set to 1 and non-minority
status to 0, and sexual status of male was set to 0 and female to 1.

RESULTS

Primary phase of training simulator, academic, and flight grades were converted to a NSS
and then combined into an overall NSS. For comparison, NSS scores were also calculated for
AQR and FAR. Tables 1 and 2 summarize the NSS scores for the groups in these two studies.

Table 1
NSS Summary for Pilot Candidates
Group Data NSS
Male Caucasians Average of AQRnss 50.90
Average of PFARnss 51.00
Average of Overall Primary Training 50.50

Minority Average of AQRnss 43.33


Average of PFARnss 43.96
Average of Overall Primary Training 45.60

Female Average of AQRnss 43.50


Average of PFARnss 41.80
Average of Overall Primary Training 46.40

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
126

Table 2
NSS Summary for NFO Candidates
Group Data NSS
Male Caucasians Average of AQRnss 50.74
Average of FOFARnss 50.37
Average of Overall Primary Training 50.53

Minority Average of AQRnss 47.78


Average of FOFARnss 48.10
Average of Overall Primary Training 47.25

Female Average of AQRnss 46.44


Average of FOFARnss 47.55
Average of Overall Primary Training 51.06

Due to limited sample size, regression analysis was not conducted on the pilot sample.
For the NFO sample a multiple regression using AQR and FOFAR raw scores yielded the
equation, y = 38.120 + .81666x1 + 1.5348x2, and an R of .3451. The degrees of freedom
numbered 737 thus resulting in a significant correlation coefficient at α = .05. Adding race
yielded the equation, y = 38.875 + .71272x1 + 1.5700x2 + (-) 2.5652x3, and an R of .3632.
Again significance was achieved, but the addition of race improved the equation. The final
regression using sex as the fourth predictor yielded the equation, y = 38.26 + 0.87x1 + 1.48x2 +
(-) 2.61x3 + 2.35x4. Sex added predictability yet at a lower extent than minority status. The
strength of the relationships among the variables considered also supports this. Table 3
summarizes the strengths.

Table 3
Correlation Matrix
Overall
Race Sex AQR FOFAR Acad NSS A/C NSS Sim NSS NSS
Race 1.00
Sex 0.04 1.00
AQR -0.11 -0.15 1.00
FOFAR -0.08 -0.09 0.87 1.00
Acad NSS -0.13 0.01 0.35 0.33 1.00
A/C NSS -0.12 0.04 0.23 0.25 0.40 1.00
Sim NSS -0.12 0.05 0.31 0.32 0.55 0.45 1.00
Overall NSS -0.15 0.05 0.33 0.34 0.61 0.90 0.78 1.00

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
127

DISCUSSION

These studies found that minority status added predictability in terms of pilot and NFO
primary flight training, and correspondingly, the null hypothesis was rejected.. It was expected
that the regression equation using AQR and FOFAR scores was statistically significant. The
ASTB was designed to help select those most likely to succeed in flight training. For the groups
in this study, the Navy used a minimum score of 3, 3 in the AQR and FAR respectively to
increase their numbers entering the program
When adding minority status into the regression equation, the R-value increased,
supporting the claim that such status adds predictive value to a selection equation. Even though
the ASTB underwent analysis to ensure the lack of racial bias, race influenced performance as
minorities typically had lower NSS scores than non-minorities. When AQR and FOFAR scores
were converted to NSS, minorities had a lower average NSS than non-minorities.. Using ASTB
NSS as beginning NSS and primary training overall NSS as ending NSS, the only significant
difference was noted among females. The ASTB underestimated the performance of females in
primary training. For males, both non-minority and minority, the ASTB was accurate. In all
phases of training, females, on the average, performed better than the test projected during
primary training.
For females, ASTB scores underestimated performance. This disparity could be blamed
on generous grading by instructors. When academic performance is evaluated, however, the same
underestimation is seen. Academic scores act as a control since the grades are reached
objectively. Possibly a biased ASTB towards females would explain the phenomenon. Also, the
difference may have been exaggerated by the relatively short time period covered in this study.

RECOMMENDATIONS

For female students, the Navy should investigate if gender bias exists in the ASTB.
Hundreds of capable candidates may have been turned away as the test incorrectly anticipated
their performance.
This study excluded Marine and Air Force students who attended primary training.
Similar studies could evaluate similar performance disparities among minorities in these services.

REFERENCES

Baisden, A. G. (1980). A comparison of college background, pipeline assignment, and


performance in aviation training for black student naval flight officers and white student
naval flight officers. (NAMRL-SR-80-2). NAS Pensacola, FL: Naval Aerospace Medical
Research Laboratory.

CNATRA Instruction 1542.54L (2002). Primary student naval flight officer/navigator training
curriculum. NAS Corpus Christi: Naval Air Training Command

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
128

CNATRA Instruction 1542.131B (2001). Intermediate naval flight officer (NFO)/air force
navigator (AF NAV) training curriculum. NAS Corpus Christi: Naval Air Training
Command

Fenrick, R. (2002). The effect of Navy and Air Force commissioning sources on performance in
naval flight officer/Air Force navigator intermediate training. Graduate Research Project
presented to Embry-Riddle Aeronautical University, NAS Pensacola, FL.

Hopson, J. A., Griffin, G. R., Lane, N. E., & Ambler, Rosalie K. Development and evaluation of a
naval flight officer scoring key for the naval aviation biographical inventory. (NAMRL-
1256). NAS Pensacola, FL: Naval Aerospace Medical Laboratory.

Miller, S. A. (1994). Perceptions of racial and gender bias in naval aviation flight training.
Master’s Thesis submitted to Naval Postgraduate School, Monterey, CA.

What is the aviation selection test battery (ASTB)? (n.d.). Retrieved November 3, 2002 from
http://navyrotc.mit.edu/www/aviation/astb.htm

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
129

Validation of an Unmanned Aerial Vehicle Operator Selection System


LT Henry L. Phillips
LT Richard D. Arnold
Naval Aerospace Medical Institute

LT Philip Fatolitis
Naval Aerospace Medical Research Laboratory

Abstract

The purpose of this study was to validate selection performance standards for the screening of
candidates for entrance into the US Navy and Marine Corps Unmanned Aerial Vehicle (UAV)
Pioneer Pilot training program. A minimum Pioneer crew consists of an external pilot (EP),
internal pilot (IP), and a mission commander/payload specialist (MC). The EP is responsible for
take-offs, landings, and control of the vehicle when it is within visual range. The IP is
responsible for control of the aircraft when it is beyond visual range. The MC is responsible for
planning and execution of the mission, operation of the payload, and for information gathering
during the mission. In the development and initial validation phases of this system, a task
analysis was completed in training and fleet squadrons to identify both tasks that are critical for
safe flight and skills required to perform piloting tasks. Specific computer-based psychomotor
tests were chosen as predictor variables based on the task analysis and initial validation. In the
present study subjects consisted of 39 students: 5 IPs and 34 Ground Control Station Operators
(who received combined IP and MC training) for whom both psychomotor test battery scores and
training outcome data were available. A single, four-component, unit-weighted, composite
scoring algorithm was generated to indicate performance on the computerized test battery. This
composite score was found to be a significant predictor of final average in primary UAV training
(r = .59, p <.001). Mean composite scores also significantly differed between students who
ultimately qualified as operators in their operational fleet units and those who failed to qualify
(t=-2.92 (37), p <.01).

Validation of an Unmanned Aerial Vehicle Operator Selection System


Uninhabited aerial vehicles (UAVs) have been the source of great interest in recent years
for military organizations (Biggerstaff, Blower, Portman, & Chapman, 1998). In experimental
settings (NASA, 2002), during training, and on the battlefield, these vehicles have demonstrated
their utility for military operations. UAVs have been used operationally for missions such as
surveillance and reconnaissance, and more recently, for precision strikes (Washington Post, Feb.
11, 2002). Military planners envision the continued integration of UAVs into military operations
for logistical efforts, suppression of enemy air defenses, and as an extension of existing manned
combat operations. Extensive work has been conducted on UAV design, capability, and operator
interface (Yelland, 2001). The question of how to select the most qualified UAV operators,
however, has received relatively little attention (Dolgin, Kay, Wasel, & Hoffman, 2002).
Such research is essential to our understanding of the human contribution to mission
effectiveness in complex UAV systems. Selection systems can afford tremendous savings in

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
130

reduced training costs – the Navy’s Aviation Selection Test Battery (ASTB) yields annual
estimated savings of $38.1 million by improving the quality of training accessions, reducing the
flight hours needed to meet winging requirements, and by lowering the number of trainees who
flunk out or quit (NAMI, 2001). A similarly effective selection procedure for UAV operators
could yield monumental savings to the Naval services by ensuring that the individuals most
likely to succeed in training are selected.
The present study represents an initial step toward achievement of that goal, and is a
follow-up to a preliminary validation study conducted by Biggerstaff et al. (1998). It evaluates
the validity of the Naval Aerospace Medical Research Laboratory (NAMRL) selection system
for Internal Pilot (IP) operators for the Pioneer UAV. Biggerstaff et al. also conducted a task
analysis to underlie subtest identification as well as physical requirements for different positions
within the Pioneer crew.
More extensive descriptions of the Pioneer, its capabilities, and its crew requirements are
provided in Biggerstaff et al. (1998), but some basic facts are presented. The Pioneer is a
relatively small UAV (wingspan 16.9 ft., length 14.0 ft., height 3.3 ft.) used for real-time
surveillance. It has a service ceiling of 12,000 ft., maximum range of 185 km, and cruising
speed of 65 kts. The Pioneer may be launched shipboard or from an airfield, and requires only
21 m for pneumatic launches and 70 m for recovery. It has been used successfully in both the
first Gulf War and Operation Enduring Freedom. Each Pioneer costs over $800,000.
A minimum Pioneer crew consists of an external pilot (EP), internal pilot (IP), and a
mission commander/payload specialist (MC). The EP is responsible for take-offs, landings, and
control of the vehicle when it is within visual range. The IP is responsible for control of the
aircraft when it is beyond visual range. The MC is responsible for planning and execution of the
mission, operation of the payload, and for information gathering during the mission. An
additional training curriculum offered for Pioneer trainees is Ground Control Station Operator
(GCSO), which receives combined IP and MC training.
Method
Participants
Participants were 39 students trained at OLF Choctaw between 1995 and 1997 for whom
both selection test battery and training outcome data were available. Of the 39 students, 5 were
IPs and 34 were GCSOs who received combined IP and MC training. While race and ethnicity
data were unavailable, 2 of the 39 were female and 2 were left-handed. Participant ages ranged
from 18 to 30, with mean age 22.08 years (SD = 3.26). This sample did not include the 14
individuals described in the Biggerstaff et al. (1998) preliminary validation.
Procedures
Participants were administered a battery of 5 different types of tasks in combinations over
11 distinct subtests listed below (more information is provided in Table 1). The entire battery
took approximately two hours to administer. Computer requirements included a PC with at least
25MHZ processing speed or greater. Peripherals attached to the machine included a monitor,
two joysticks, rudder pedals, numeric keypad, and headphones.
Measures
Tasks
Psychomotor tasks. Three psychomotor tasks, titled stick, rudder, and throttle, were
administered in additive conjunction with each other.
Stick. The stick test required subjects to use a joystick to keep a crosshair cursor (‘+’)
centered at the intersection of a row and a column of dots. The stick could be moved in any

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
131

direction, and stick movement produced cursor movement in the opposite direction. The cursor
movement algorithm introduced a constant rate of drift to the upper right. The 3 minute test was
preceded by 3 minutes of practice.
Rudder. The rudder task required subjects to operate a set of rudder pedals with their feet
to keep a crosshair cursor centered along a horizontal axis of dots. This axis and cursor appeared
on the screen simultaneously with the stick cursor and dot axes beneath the latter. Depressing the
left pedal caused the cursor to move to the right, and depressing the right caused cursor motion to
the left. This test was also preceded by a 3 minute practice session.
Throttle. The throttle task added a second joystick, operated with the left hand, to the
controls used to operate the stick and rudder tasks. Goal of the throttle task was to keep a third
cursor centered along a vertical axis of dots, left of the stick cursor and dot axes. The throttle
joystick moved in two directions only, toward the examinee and toward the screen. Movement
of the throttle produced movement of a vertical cursor in the opposite direction. This task also
was preceded by a 3 minute practice session.
Figure 1 from Biggerstaff et al. (1998) displays the apparatus used for all three
psychomotor tasks. Each task was scored as the total number of pixels a respective cursor is off-
center at random intervals throughout the test. The stick task was administered alone over 3
minutes and in conjunction with rudder twice for 3 minutes each trial. Throttle was administered
only in conjunction with stick and rudder once for 4 minutes. The stick and rudder tasks were
also administered in conjunction with the dichotic listening task over 4 minutes.
Dichotic listening task. This task required subjects to focus on one string of audio
information in the presence of competing audio stimuli. Over 12 trials in 4 minutes, subjects
were presented simultaneous but different strings of alphanumeric characters to each ear. A cue
indicated which ear participants were to attend to during a given trial. Subjects were instructed
to key in the numbers they heard in the designated ear using a numeric keypad, but ignore the
letters. The task was preceded by 4 practice sessions over 3 minutes.
Horizontal tracking. This task required subjects to keep a square cursor centered on a
horizontal axis using a joystick, depicted in Figure 2 from Biggerstaff et al. (1998). The cursor
algorithm made the cursor accelerate as its distance from center increased, forcing participants to
attempt to ‘balance’ the cursor over the center point through small corrective adjustments. The
direction of joystick input matched the direction of cursor movement for this task. The
horizontal tracking task was administered in 7 sessions over 15 minutes.
Digit cancellation task. This task required subjects to enter randomly generated numbers
ranging from 1-4 on a keypad using their left hands as the numbers appeared on the screen. It
was administered alone for 2 minutes and in conjunction with the horizontal tracking task for 8
minutes.
Manikin task. This test assessed ability to perform mental rotations and reversals. It
consisted of drawings of a human figure holding a square in one hand and a circle in the other.
The figure was oriented in four ways over 48 trials: facing forward or backward and upside-
down or upright. The task was to determine whether the square was held in the right or left hand
in a given trial. This task was not timed, and was not administered in conjunction with any other
tasks.
Score components
Due to relatively small sample size, a priori unit-weighted combinations of task scores
from various subtests were used to generate 4 broad component scores: psychomotor ability,
multitasking calculation, multitasking psychomotor, and visuospatial ability. Specific details of

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
132

score component calculation are provided in Table 1. The mean of the four components is
reported as an index score.
Psychomotor ability. This component assessed eye-hand coordination. It did not assess
multitasking or divided attention.
Multitasking calculation. This component assessed ability to focus on audio and visual
numeric inputs and information under conditions of divided attention. Distracting activities
performed simultaneously with calculations included dichotic listening and digit cancellation.
Multitasking psychomotor. This component assessed psychomotor performance under
conditions of divided attention introduced by competing psychomotor activities as well as
dichotic listening.
Visuospatial ability. This component captured the ability to perform mental rotations and
reversals. It was based solely on trials of the manikin task, and did not assess multitasking or
divided attention.
Criterion variables
Two outcome variables were predicted in this study: final average of test performance
and flight evaluations for UAV operator training curriculum, scored on a continuous scale, and
post-primary phase attrition from training, which was scored dichotomously.
Results
Variable minimums, maximums, means, and standard deviations are presented in Table 2
(Index M = -.01 SD = .77; Visuospatial M = -.03 SD =.99; Multitasking psychomotor M = -.01
SD = 1.10; Multitasking calculation M =.04 SD = .81; Psychomotor M = -.02 SD = .90; Training
performance M = 93.67 SD 3.18). The same table also displays correlations among all variables.
All correlations were significant at p < .01, including correlations of score components with
training performance (Index r = .59; Visuospatial r = .54; Multitasking psychomotor r = .51;
Multitasking calculation = .42; Psychomotor r = .43).
Differences between students attriting from training and those completing training were
significant for all score components (Visuospatial t12.4 = 2.38, p < .05; Multitasking psychomotor
t37 = 2.47, p < .05; Multitasking calculation t37 = 2.01, p < .05; Psychomotor t37 = 3.06, p < .01)
including the index score (t37 = 2.91, p < .01), but not for training performance (t37 = 1.15, ns)
(see Table 3).
Discussion
Results were impressive. Index score and all score components correlated strongly with
training performance and reliably differentiated between attriting and non-attriting students.
Additionally, because unit-weighted score component computations were determined a priori
based on content validity alone (Cascio, 1991), it was not necessary to cross-validate results
using a separate sample for purposes of the present study.
Performance on this test battery appears to be an excellent predictor of both training
performance and attrition. Adoption of this or a similar selection procedure incorporating
reasonable minimum performance standards should serve to both improve mean trainee
performance and reduce training attrition, likely resulting in substantial savings to the Marine
Corps (NAMI, 2001).
Subsequent exploratory work on this and future samples will investigate the joint roles of
accuracy and reaction time in prediction of UAV training performance and attrition. Due to the
relatively small sample size available for this study, the number of a priori relationships of
predictor combinations with criterion variables tested was kept small to avoid inflation of alpha
error. Even so, results were extremely promising.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
133

References
Biggerstaff, S., Blower, D. J. Portman. C. A. and Chapman. A. D. (1998). The
development and initial validation of the unmanned aerial vehicle (UAV) external pilot selection
system. NAMRL-1398, Pensacola. FL: Naval Aerospace Medical Research Laboratory,
Selection Division.
Cascio, W. F. (1991). Applied psychology in personnel management. (4th ed.).
Englewood Cliffs, NJ: Prentice-Hall.
Dolgin, D., Kay, G., Wasel, B., Langlier, M., & Hoffman, C. (2002). Identification of
the cognitive, psychomotor, and psychosocial skill demands of uninhabited combat aerial
vehicles (UCAV) operators. Survival and Flight Equipment Journal, 30, 219-225.
Helton, K.T., Nontasak, T., & Dolgin, D.L. (1992). Landing craft air cushion crew
selection system manual (Tech. Rep. No. 92-4). Pensacola, FL: Naval Aerospace Medical
Research Laboratory, Selection Division.
Hilton, T. F. & Dolgin, D. L. (1991). Pilot Selection in the military of the free world. In
R. Gal and A. D. Mangelsdorff (Eds.), Handbook of Military Psychology, pp. 81-101. Sussex,
England: John Wiley and Sons.
McHenry, J., Hough, L. M. Toquam. J. L., Hanson. M. A. & Ashworth, S. (1990). Project
A validity results: The relationship between predictor and criterion domains. Personnel
Psychology, 43, 335-354.
NASA Mission Demonstrates Practical Use of UAV Technology Oct 17, 2002. Online at
http://www.uavforum.com/library/news.htm; visited March 25, 2003.
Street, D. R., & Dolgin, D. L. (1994). Computer-based psychomotor tests in optimal
training track assignment of student naval aviators. NAMRL-1391, Pensacola, FL: Naval
Aerospace Medical Research laboratory, Selection Division.
Yelland, B. (2001). UAV Technology Developmental: A Node within a System: Flight
International’s UAV Australia Conference Proceedings, 8-9 February 2001. Melbourne,
Australia.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
134

Table 1.
Test Combinations and Score Component Derivation

Test Administration Order and Combinations

Test 1: stick
Test 2: DL
Test 3: DL & stick
Test 4: stick & rudder
Test 5: DL, stick, & rudder
Test 6: stick, rudder, & throttle
Test 7: not used
Test 8: HT (4 trials)
Test 9: DC (correct, RT, & RT SD tracked)
Test 10: HT & DC (3 joint trials)
Test 11: Manikin (correct, RT & RT SD tracked; four trials)

Score Component Derivation

Psychomotor:
Test 1 stick
Test 8 4 trials of HT

Multitasking-calculation:
Test 3 DL
Test 10 3 trials of DC

Multitasking-psychomotor:
Test 3 stick
Test 4 stick & rudder
Test 5 stick & rudder
Test 6 stick, rudder, & throttle
Test 10 HT

Visuospatial:
Test 11 4 Manikin trials

Note: DC: Digit cancellation; DL: Dichotic listening; HT: horizontal tracking; RT: Reaction
time; SD: Standard deviation

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
135

Table 2.
Correlations and descriptive statistics (N = 39).

Descriptives Min Max Mean SD


1. Index Score -1.86 1.39 -.01 .77
2. Visuospatial -1.64 2.23 -.03 .99
3. Multitasking – psychomotor -2.40 1.67 -.01 1.10
4. Multitasking – calculation -1.73 1.72 .04 .81
5. Psychomotor -2.51 1.40 -.02 .90
6. Training Performance 87.71 99.09 93.67 3.18

Correlations 1 2 3 4 5
1. Index Score
2. Visuospatial 75
3. Multitasking – psychomotor 90 52
4. Multitasking – calculation 73 48 53
5. Psychomotor 82 39 81 43
6. Training Performance 59 54 51 42 43

Note: Variables 2-5 computed as means of contributing standardized variables. Index score
computed as the mean of variables 2-5. All correlations significant at p < .01. Decimals
omitted.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
136

Table 3.
Variable means by attrite status (N = 39)

Attrite (n = 6) Mean SD SE 95% CI


1. Index Score ** -.77 .65 .26 -1.30 to -.24
2. Visuospatial *** -.61 .55 .23 -1.06 to -.16
3. Multitasking – psychomotor * -.97 1.02 .42 -1.81 to -.14
4. Multitasking – calculation * -.55 .95 .39 -1.32 to .22
5. Psychomotor ** -.96 .73 .30 -1.55 to -.36
6. Training performance 92.30 .366 1.49 89.31 to 95.29

Complete (n = 33) Mean SD SE 95% CI


1. Index Score ** .13 .71 .12 -.11 to .38
2. Visuospatial *** .07 1.02 .18 -.28 to .43
3. Multitasking – psychomotor * .16 1.03 .18 -.20 to .52
4. Multitasking – calculation * .14 .75 .13 -.12 to .41
5. Psychomotor ** .16 .83 .14 -.13 to .45
6. Training performance 93.92 3.08 .54 92.85 to 94.99

Note. ** Complete-attrite difference significant at p < .01; * at p < .05; *** significant at p < .05
assuming unequal group variances.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
137

Figure 1.

Stick, rudder, and throttle task apparatus and display for psychomotor test (from Biggerstaff et
al., 1998).

Throttle Stick
Cursor Cursor

Rudder
Cursor

Throttle Stick

Rudder Pedal

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
138

Figure 2.

Horizontal tracking task controls and display from Biggerstaff et al. (1998).

Stick
(cursor at zero position)

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
139

Figure 3.

Scatterplot of index score against training performance by training completion status (N = 39).

100

98

96

94

92
Training Performance

90

Attrite Status
88
Complete

86 Attrite
-2.0 -1.5 -1.0 -.5 0.0 .5 1.0 1.5

Index Score
Index Score – Training r = .59, p < .01.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
140

SUCCESS AT COLLABORATION
AS A FUNCTION OF KNOWLEDGE DEPTH*
Mark A. Sabol, Brooke B. Schaab, and J. Douglas Dressel
U. S. Army Research Institute for the Behavioral and Social Sciences, Alexandria, Virginia

Andrea L. Rittman
George Mason University, Fairfax, Virginia, Consortium Research Fellows Program**

Abstract

Pairs of college students played a computer game, SCUDHunt***, that requires


collaboration. Separated but communicating players deployed information-gathering “assets” to
locate 3 missile launchers on a 5x5 grid. Since each player controlled only half the assets (either
the "ground" or "air" set), pairs had to coordinate asset deployment to maximize the value of
information collected. Before the game, each player in 13 pairs received Deep-but-Narrow
(DbN) training, i.e., two identical sessions on the attributes (possible movements and reliability)
of assets controlled by that player; 13 other pairs received Broad-but-Shallow (BbS) training, a
session on one's own assets, followed by equivalent training on one's partner's assets. A quiz on
training content followed each session. Pairs then played two 5-turn games, each turn requiring
each player to guess the fixed launcher locations.

Results suggest that knowledge of one set of assets – those of the "ground controller" –
was more crucial to game-playing success than knowledge of the other set – those of the "air
controller." But knowledge of that more crucial set proved more complex and difficult to
acquire. During the first game, players assigned the more crucial set needed DbN training to
succeed. However, players given BbS training appeared to gain knowledge of their partners'
assets while playing the first game, leading to improvement in later performance. Players given
DbN training on the less crucial assets did poorly throughout. We interpret these preliminary
results as addressing the question whether training for collaborative tasks should include system-
wide aspects or concentrate on a single role.

____________
* The views expressed in this paper are those of the authors and do not necessarily represent an
official position of the U. S. Army or Department of Defense.

** Farrasha L. Jones made important contributions to the data collection phase of this research
while a student at George Mason University and a participant in the Consortium Research
Fellows Program.

*** Reference to and use of this product does not constitute endorsement by the
U. S. government.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
141

The application of new information technologies to the battlefield allows widely


dispersed combatants to work together in new ways, altering the traditional conduct of warfare.
Such network-centric operations have been described as “based upon a new model of command
and control, one that features sharing information (synchronization in the information domain)
and the collaborative processes to achieve a high degree of shared situational awareness” (p. 60,
Alberts, 2003). Successful employment of these new technologies, which join personnel from
diverse military jobs into interactive networks, relies on the Soldiers' tendency to engage in --
and their skill at accomplishing -- collaborative interaction. But Alberts cautions that, without
appropriate training and practice, the network-centric environment might actually increase the
fog of war rather than provide superior situational understanding. To insure the latter result and
avoid the former, the training side of the Army, in particular, needs to understand the dynamics
of this new environment, where Soldiers interact with their peers and leaders electronically. The
purpose of this paper is to describe the preliminary results of our research team's attempts to
identify training issues that arise when unacquainted Soldiers must collaborate at a distance,
rather than face-to-face.

Research on collaboration. The Army defines collaboration as, “people actively sharing
information, knowledge, perceptions, or concepts when working together toward a common
purpose." It is well established that the basis for collaboration is a shared understanding of the
situation (Clark, & Brennan, 1991). But this understanding is more than shared information or
even what is sometimes called a Common Relevant Operating Picture (CROP). Establishing a
CROP should be seen as the beginning, not the endpoint, in establishing situational awareness.
As Hevel (2002) has said, each person’s interpretation of the CROP depends on that individual's
training, experience, and values.

To gain further insights into such issues involving collaboration and training, we first
conducted observations and interviews of Army personnel in units that were in the process of
incorporating digital systems (Schaab & Dressel, 2003). It soon became clear that classroom
training on how to use digital systems is not enough. Even inexperienced Soldiers know that
their digital jobs require an understanding of how the system they are learning to operate
interacts with other systems. But they may need to experience multiple training exercises,
incorporating numerous scenarios, in order to develop both a clear sense of how to collaborate
with the people operating those other systems and an appreciation of how important such
collaboration is in achieving and maintaining situational understanding. In one command center,
we saw Soldiers actually place two different systems side-by-side and cross train each other in
order to promote face-to-face collaboration. They already grasped the need to understand the
interrelationship between their roles. Such opportunities to foster mutual understanding become
more difficult, of course, when members are dispersed.

Successful collaboration in distributed environments requires the same abilities as


collaboration when co-located, but the means of training must differ when groups are distributed
(Klein, Pliske, Wiggins, Thordsen, Green, Klinger, & Serfaty, 1999). Challenges with
distributed groups include the loss of visual/verbal cues, added effort in working together, and
difficulty in knowing when goals need to be adjusted. In short, good communication is an

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
142

antecedent of effective team performance, and communication becomes more difficult when
teams are dispersed.

Indeed, previous research has suggested that the most important aspects of collaboration
may be these intertwined issues of communication and shared mental models of the combat
situation (Schrage, 1990). Collaboration helps create shared meaning about a situation, and this
shared meaning is important for effective decision-making performance. At the same time,
some prior shared situational awareness is essential for effective communication, and
communication is crucial in maintaining and refining that shared awareness. We designed our
research program with these complexities in mind.

We selected the game SCUDHunt for our research on collaboration precisely because it
provides a simplified model of this interplay of shared awareness and communication, while
permitting independent manipulation of variables thought to affect them. SCUDHunt requires
participants to (1) collaborate from distributed locations and (2) share unique information from
their intelligence assets for optimal game performance. The goal of the game is simple, to locate
three SCUD missile launchers on a map. To accomplish this, separated but communicating
players use a computer mouse to deploy information gathering "assets" across the map they share
on their computer screens. Players get five "turns" during which they can gather and accumulate
that information regarding launcher locations. The game thus requires players to execute digital
tasks in order to achieve a shared goal, while performing their different tasks in geographically
separate locations.

Our research began with a cognitive task analysis of this game to identify critical points
where collaboration would be beneficial.* These are points where players need to communicate
planning strategies and to share gathered information in order to perform effectively. The general
collaboration areas identified were:

Coordinating deploying: This is the discussion among players of where best to place their
assets on the map grid, with the goals of (1) maximizing coverage of the area remaining to
be searched, and (2) using certain assets to verify the results of earlier searches;

Interpreting results: This is the discussion among players of the reliability of reports from
different intelligence-gathering assets, leading to a determination of the likelihood that a
SCUD launcher is at any particular location. This involves interpretation of results from the
current turn, as well as integration of findings from previous searches.

____________
* Ross, K.G. (September, 2003). Perspectives on Studying Collaboration in Distributed
Networks, Contractor Report prepared for the U. S. Army Research Institute for the
Behavioral and Social Sciences by Klein Associates Inc., Fairborn, Ohio, under
Contract PO DASW01-02-P-0526.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
143

Communication. Recent research has shown that both communication content (Urban, Weaver,
Bowers, & Rhodenizer, 1996) and statement patterns (Kanki, Lozito, & Foushee, 1989; Achille,
Schulze, & Schmidt-Nielsen, 1995) and sequences (Bowers et al., 1998) influence team
coordination and performance. To investigate these relationships, we included in our research
design a manipulation of communication ease: All participating pairs wore headsets that allowed
direct oral communication during one of the two games they played; during the other game,
participants were permitted to communicate only by sending typed messages via an on-screen
"chat" box. For a random half of the pairs, the "chat" game came first. The data we collected
included measures of the types and frequency of communication between participants in the
"chat" game, during which each transmitted message was recorded. The analysis of these data is
not yet complete. Effects of that manipulation and results of that analysis will be presented in a
later paper.

Knowledge. The notion of networked individuals who have shared goals but unique roles raises
a new question for research on collaboration at a distance: To what extent does knowledge about
a partner’s role matter to an individual’s performance effectiveness? Knowledge about others’
roles, responsibilities, and job requirements has been termed "interpositional knowledge." One
training strategy effective for increasing interpositional knowledge among team members is
cross-training (Blickensderfer Cannon-Bowers, & Salas, 1998).

Volpe, Cannon-Bowers, Salas, and Spector (1996) defined cross-training as a strategy


where “each team member is trained on the tasks, duties, and responsibilities of his or her fellow
team members” (p. 87). This involves having team members understand and sometimes practice
each other’s skills. The Volpe et al. (1996) initial research on cross-training, as well as an
extension (Cannon-Bowers, Salas, & Blickensderfer, 1998), showed that those who received
cross-training were better able to anticipate each other’s needs, shared more information, and
were more successful in task performance. Additional research has found that cross-training and
a common understanding of roles contributes to shared mental model development, effective
communication, and improved coordination (McCann, Baranski, Thompson, & Pigeau, 2000;
Marks, Sabella, Burke, & Zaccaro, 2002).

Some researchers have even suggested that "implicit coordination" may be an important
mechanism. Here, cross-trained teams may be better able to coordinate without depending on
overt communication (Blickensderfer et al., 1998). This phenomenon has been suggested as an
intervening factor that explains the benefit that cross-training imparts to a team task. Implicit
coordination may only be possible given the reduced interpositional uncertainty regarding other
team members' roles that comes with cross-training. However, distributed environments may
limit the extent to which this implicit coordination can operate.

To investigate such issues, we included cross-training versus intensive training in one


role as the primary independent variable in the portion of our research reported here. Players
received either a double dose of training on the tasks (deployment and interpretation of "their"
assets) they would be expected to perform later or a single dose of training on their own tasks
and on those expected of their partners. The question being asked by the inclusion of this

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
144

variable may be stated in terms of training depth and breadth. That is, those participants who
receive a double exposure to training on their own set of assets are receiving deep but narrow
training. They should be turned into fairly good "experts," with confidence in their
understanding of their own role in finding the missing SCUD launchers, but with little or no
understanding of the contribution that could be made by their partners. On the other hand, those
participants who receive training on both their own assets and those of their partners may be said
to receive broad but shallow training. They may have some understanding of how they and their
partners can work together toward successfully finding the launchers, but they may not be very
confident in their ability to apply any of the specific knowledge they were taught. We are
asking: Which type of training, deep but narrow (DbN) or broad but shallow (BbS) leads to
better performance?

Two other independent variables were automatically included in the experiment by the
nature of the task. The first is position, whether a participant was assigned the "ground
controller" position, in charge of the information gathering assets that are "on the ground" (a
single spy, a team of Navy Seals, a Joint Special Operations team, and a communications
analysis unit), or the "air controller" position, in charge of the gathering assets that are "airborne"
(a reconnaissance satellite, a manned aircraft, and an unoccupied aerial vehicle). The second is
the sequence of turns in the two games all participants played, a variable that may be thought of
as representing increasing on-the-job experience. Our experimental questions are, therefore, how
the first two variables, training and position, affect game performance across the five turns of the
two games, separately and in interaction.

Method

Participants. This experiment employed undergraduate students at a large university in Virginia.


A total of 52 students received course credit for two hours of participation, for which they were
scheduled as 26 pairs.

Instruments. In addition to an Informed Consent form and various questionnaires used in other
aspects of this research, the following measurement instruments were used in the course of this
experiment: 1) The asset quizzes on the knowledge the participants acquired during training
prior to playing the game, and 2) The SCUDHunt game, itself, described below:

The SCUDHunt game presents players with the mission of determining where – on a
five-by-five board representing the map of a hostile country – the launchers for SCUD missiles
are located. The players are told that there are three such launchers, each in a different fixed
location on one square among the 25 squares on the board. On each of five turns, the players
deploy intelligence-gathering assets (for example, a reconnaissance satellite or a team of Navy
Seals), receive reports from those assets, and create a “strike plan” (to be sent to their fictional
commander) indicating their best guess at that point as to the launcher locations. They are told
that only the final strike plan – after the fifth turn – will actually be used by their commander to
direct an attack on the launchers, and they are given the results of this final strike plan in terms of
which bombed location held a now-destroyed launcher. This game is a representation of the kind

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
145

of situation in which Soldiers would use digital systems to execute tasks requiring collaboration.
The primary measures generated as the game is played are 1) the number of launcher that would
have been hit by each strike plan submitted and 2) the degree to which the two players on a team
chose to include the same grid squares in their independent strike plans. Only the first of these
will be discussed here.

Design. The primary independent variable for this experiment (ALL versus OWN) involved
training on the characteristic of the information-gathering assets used in the SCUDHunt game.
All participants received, as their first training module, an explanation of the characteristics of
the assets they would be controlling. Half of the pairs (the OWN condition) received a second
exposure to the same asset training; the other half (the ALL condition) received training in which
each participant learned the characteristics of the assets to be controlled by that participant's
partner. A secondary independent variable is the position to which participants were randomly
assigned, either "air" or "ground" controller. This position determined the particular set of
information gathering assets that were under the participant's control. The main dependent
variables in this experiment are 1) each participant's performance on the asset-knowledge quizzes
administered after each asset training module, 2) success at playing the SCUDHunt game, as
measured by the number of missile launcher locations correctly identified in a strike plan.

Procedure. Upon arrival at the laboratory, participants completed a preliminary questionnaire on


their experience with computers and computer games. The experimenter then explained that the
experiment would involve the participants playing such a computer game with a partner in
another room. First, they would watch a training video giving an overview on how the game is
played and explaining the concept of information-gathering "assets." They would then see a
video providing details on their assets, after which they would be asked a few questions about
what they had just learned.

Several computer-based training modules were then presented on 1) the overall aspects of
playing the SCUDHunt game and 2) the characteristics of the information-gathering assets used
in playing the game. Participants took paper and pencil quizzes on the material just presented
following each training module. Immediately after this training, the pair played a one-turn
practice game, to insure that the mechanics of playing the game were understood. After the
experimenters answered any question the participants might have, the pair played two complete
five-turn games of SCUDHunt. During these games, data were automatically collected on 1) the
messages participants sent to each other, 2) the degree to which grid squares chosen as targets in
the "strike plans" (submitted at the end of each turn) were identical for the two members of the
pair, and 3) the number of those chosen target squares that actually contained missile launchers.

Results and Discussion

The primary results are presented in Figures 1 and 2. Figure 1 depicts results for those
participants in the "air controller" position; it presents, for them, the main measure of success at
playing the SCUDHunt game – the number of SCUD launchers positions correctly identified –
on each of the five turns of both games played. Figure 2 presents the same data from the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
146

participants in the "ground controller" position. The following pattern can be seen: Regardless
of position, during the second game, participants given the "all" training (cross-training) seem to
be at least as successful as their counterparts given the "own" training, that is, double training on
one set of assets. That advantage for cross-training is not evident in the first game. In fact,
cross-trained individuals in the "ground controller" position seem to have been at a disadvantage
during the first game.
Mean # of Launchers Found

3
2.5
2
Own
1.5
All
1
0.5
0
1 2 3 4 5 6 7 8 9 10
Turns Across Two Games

Figure 1. Success by "air controllers" at finding launcher positions, separately for 13 given
cross-training ("All") and 13 given double training on one set of assets ("Own").
Mean # of Launchers Found

3
2.5
2
Own
1.5
All
1
0.5
0
1 2 3 4 5 6 7 8 9 10
Turns Across Two Games

Figure 2. "Ground controllers'" success at finding launcher positions, separately for 13 given
cross-training ("All") and 13 given double training on one set of assets ("Own").

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
147

An analysis of variance was performed on the data in these two figures, taking into
account the matching of the "air" and "ground" controllers in each pair. In order to simplify the
analysis, only the data from turn 5 (on each of the 2 games) was entered into this analysis. This
transforms the multi-level variable of "Turns" in the figures into the binary variable of games,
first versus second. These simplified data are depicted below (Figure 3.)

3
Mean # of Launchers Found on

2.5

2
5th Turn

Own
1.5
All
1

0.5

0
Air 1st Air 2nd Grd 1st Grd 2nd
Position and Game

Figure 3. Success at finding launcher positions on the fifth (last) turn of each game,
separately for players in the "Air Controller" and "Ground Controller" positions and
separately for those given cross training ("All") and those given double training on
one set of assets ("Own"). Each point represents 13 players.

None of the three main effects ("all" versus "own" training, "ground" versus "air"
position, and first versus second game) was significant. However, a highly significant result was
found for the two-way interaction between position and games (F(1,24)=10.29, p<.005), as well
as a marginally significant result for the two-way interaction between position and the training
variable (F(1,24)=3.34, p<.10). The three-way interaction among position, training, and games
was also marginally significant (F(1,24)=2.98, p<.10).

We consider the three-way interaction most important, suggesting as it does that the
combined effect of training and game experience was different, depending upon the position the
player controlled. One interpretation of these results is as follows: Knowledge of one set of
assets – those of the "ground controller" – was more crucial to game-playing success than
knowledge of the other set – those of the "air controller." But knowledge of that more crucial set
was more complex and more difficult to acquire, so that it is only acquired to any useful degree
through the double dose of training received in the DbN training condition. Thus, during the first
game, players assigned the "ground controller" position, the one whose assets are more crucial
but more complex, needed DbN training to succeed. However, players given BbS training

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
148

performed better on the second game. It may be that, while playing the first game, these broadly
trained participants gained additional knowledge of the role of their partners. This learning
could have been facilitated by their exposure to training on all assets, and that learning could
have led to improvement in later performance. However, players given DbN training on the less
crucial assets of the "air controller" position seem to have been at an initial disadvantage, one
that only increased in the second game.

The results presented here support the view that collaborative tasks benefit when the
collaborating participants have a broader, system-wide view of the entire situation. At least in
this experiment, training that concentrated on a single role was only beneficial if that role was
complex enough to require considerable training depth. It should be emphasized, however, that
these results are only preliminary and this research is continuing. In particular, further iterations
of this experiment will include a final measure of asset knowledge, in order to provide a direct
test of the hypothesis that cross-training facilitates on-the-job learning.

References

Achille, L.B., Schulze, K.G. & Schmidt-Nielsen, A. (1995). An analysis of communication and
the use of military terms in navy team training. Military Psychology, 7(2), 95-107.

Alberts, D. S. (2002). Information age transformation: Getting to a 21st century military.


Washington, DC: DoD Command and Control Research Program.

Blickensderfer, E., Cannon-Bowers, J. A., & Salas, E. (1998). Cross-training and team
performance. In J. A. Cannon-Bowers & E. Salas (Eds.), Decision making under stress:
Implications for training and simulation (pp. 299-312). Washington, DC: American
Psychological Association.

Bowers, C.A., Jentsch, F., Salas, E. & Braun, C.C. (1998). Analyzing communication sequences
for team training needs assessment. Human Factors, 40(4), 672-679.

Brannick, M.T., Roach, R.M. & Salas, E. (1993). Understanding team performance: a
multimethod study. Human Performance, 6(4), 287-308.

Cannon-Bowers, J. A., Salas, E., & Blickensderer, E. (1998). The impact of cross-training and
workload on team functioning: A replication and extension of initial findings. Human
Factors, 40, 92-101.

Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In: L.B. Resnick, J. M.
Levine, & S. D. Teasley (Eds.) Perspectives on socially shared cognition. American
Psychological Association: Washington, DC.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
149

Hevel, J. R. (2002). The Objective Force Battle Staff? Monograph, School of Advanced Military
Studies, U. S. Army Command and General Staff College, Fort Leavenworth, Kansas.

Kanki, B.G., Lozito, S. & Foushee, H.C. (1989). Communication indices of crew coordination.
Aviation, Space, and Environmental Medicine, 60, 56-60.

Klein, G., Pliske, R., Wiggins, S., Thordsen, M., Green, S., Klinger, D., & Serfaty, D. (1999). A
model of distributed team performance. (SBIR N613399-98-C-0062). NAWCTSD
Orlando, FL.

Marks, M. A., Sabella, M. J., Burke, C. S., & Zaccaro, S. J. (2002). The impact of cross-training
on team effectiveness. Journal of Applied Psychology, 87, 3-13.

McCann, C., Baranski, J. V., Thompson, M. M., & Pigeau, R. A. (2000). On the utility of
experiential cross-training for team decision-making under time stress. Ergonomics, 43,
1095-1110.

Schaab, B.B., & Dressel, J.D. (2003). Training the troops: What today's Soldiers tell us about
training for information age digital competency. (Research Report 1805). Alexandria,
VA: U.S. Army Research Institute for the Behavioral and Social Sciences.

Schrage, M. (1990). Shared minds: The new technologies of collaboration. New York:
Random House.

Volpe, C. E., Cannon-Bowers, J. A., Salas, E., & Spector, P. E. (1996). The impact of cross-
training on team functioning: An empirical investigation. Human Factors, 38, 87-100.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
150

U.S. NAVY SAILOR RETENTION: A PROPOSED MODEL OF


CONTINUATION BEHAVIOR2

Jessica B. Janega and Murrey G. Olmsted


Navy Personnel Research, Studies and Technology Department
jessica.janega@persnet.navy.mil

Sailor turnover reduces the effectiveness of the Navy. Turnover has improved
significantly since the late 1990s due to the implementation of a variety of retention
programs including selective re-enlistment bonuses, increased sea pay, changes to the
Basic Allowance for Housing (BAH) and other incentives. Now in many cases, the Navy
has adequate numbers of Sailors retained, however, it faces the problem of retaining the
best and brightest Sailors in active-duty service (Visser, 2001). Changes in employee
values will require that organizations, such as the Navy, make necessary changes in their
strategy to retain the most qualified personnel (Withers, 2001). Attention to quality of life
issues is one way in which the military has addressed the changing needs of its members
(Kerce, 1995). One of the most effective ways to assess quality of life in the workplace is
to look at the issue of job satisfaction. Job satisfaction represents the culmination of
feelings the Sailor has toward the Navy. Job satisfaction in combination with variables
like organizational commitment can be used to predict employee (i.e., Sailor) retention
(for a general overview see George & Jones, 2002). The purpose of this paper is to
explore the relationship of job satisfaction, organizational commitment, career intentions,
and continuation behavior in the U.S. Navy.

Job Satisfaction
According to Locke (1976), job satisfaction is predicted by satisfaction with
rewards, satisfaction with work, satisfaction with work context (or working conditions),
and satisfaction with other agents. Elements directly related to job satisfaction include
direct satisfaction with the job, action tendencies, career intentions, and organizational
commitment (Locke, 1976). Olmsted & Farmer (2002) replicated a version of Locke’s
(1976) model of job satisfaction proposed by Staples & Higgins (1998) by applying it to
a Navy sample. Staples and Higgins (1998) proposed that job satisfaction is both a factor
predicted by other factors, as well as an outcome in and of itself. Olmsted & Farmer
(2002) applied the model of Staples and Higgins (1998) directly to Navy data using the
Navy-wide Personnel Survey 2000. The paper evaluated two parallel models, which
provided equivalent results indicating that a similar version of Locke’s model could be
successfully applied to Navy personnel.

Organizational Commitment
Organizational commitment involves feelings and beliefs about entire
organizations (George & Jones, 2002). Typically, organizational commitment can be
viewed as a combination of two to three components (Allen & Meyer, 1990). The

2
The opinions expressed are those of the authors. They are not official and do not represent the views of
the U.S. Department of Navy.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
151

affective (or attitudinal) component of organizational commitment involves positive


emotional attachment to the organization, while continuance commitment is based on the
potential losses associated with leaving the organization, and normative commitment
involves a commitment to the organization based on a feeling of obligation (Allen &
Meyer, 1990). Commonalities across all affective, normative, and continuance forms of
commitment indicate that each component should affect employee’s intentions and final
decision to continue as a member of the organization (Jaros, 1997). The accuracy of these
proposed relationships have implications for turnover reduction because “turnover
intentions is the strongest, most direct precursor of turnover behavior, and mediates the
relationship between attitudes like job satisfaction and organizational commitment and
turnover behavior,” (Jaros, 1997, p. 321). This paper primarily addresses affective
commitment, since it has a significantly stronger correlation with turnover intentions than
either continuance or normative commitment (Jaros, 1997).

Career Intentions
Career intentions represent an individual’s intended course of action with respect
to continuation in their current employment. While a person’s intentions are not always
the same as their actual behavior, an important assumption is that these intentions
represent the basic motivational force or direction of the individual’s behavior (Jaros,
1997). In general, Jaros (1997) suggests that the combination of organizational
commitment and career intentions appears to be a good approximation of what is likely to
occur in future career behavioral decisions (i.e., to stay or leave the organization).

Purpose
This paper looks at job satisfaction, organizational commitment, career intentions,
and continuation behavior using structural equation modeling. It was hypothesized that
increased job satisfaction would be associated with increased organizational commitment,
which in turn would be positively related to career intentions and increased continuation
behavior (i.e., retention) in the Navy. A direct relationship was also hypothesized to exist
between career intentions and continuation behavior.

METHODS

Participants
The sample used in this study was drawn from a larger Navy quality of work life
study using the Navy-wide Personnel Survey (NPS) from the year 2000. The NPS 2000
was mailed to a stratified random sample of 20,000 active-duty officers and enlisted
Sailors in October 2000. A total of 6,111 useable surveys were returned to the Navy
Personnel Research, Studies, & Technology (NPRST) department of Navy Personnel
Command, a return rate of 33 percent. The current sample consists of a sub-sample of
700 Sailors who provided social security numbers for tracking purposes. Sailors whose
employee records contained a loss code 12 months after the survey were flagged as
having left the Navy (10.4%). Those Sailors who still remained in active-duty in the
Navy (i.e., those who could be tracked with social security number and did not have a
loss code in their records) were coded as still being present in the Navy (87.8%). Sailors
whose status was not clear from their employment records (i.e., those who could not be

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
152

tracked by social security number) were retained in the analysis with “unknown status”
(1.8%).

Materials
The NPS 2000 primarily focuses on issues related to work-life and career
development for active-duty personnel in the U.S. Navy. The survey contains 99
questions, many of which include sub-questions. Formats for most of the 99 questions
follow a five-point Likert-type scale.

Analysis Procedures
This sample contained missing data. Of those who returned the NPS 2000, not
every Sailor filled it out completely. For this reason, Amos 4.0 was chosen as the
statistical program to perform the structural equation models for this sample. Amos 4.0 is
better equipped to handle issues with missing data then most any other structural equation
modeling program (Byrne, 2001). Once acceptable factors were found via data reduction
with SPSS 10, the factors and observed variables were input into Amos 4.0 for structural
equation modeling via maximum likelihood estimation with an EM algorithm (Arbuckle
& Wothke, 1999).

RESULTS

Overall, the proposed model ran successfully and fit the data adequately. A
significant chi-square test was obtained for the model, indicating more variance remains
to be accounted for in the factor, χ2(938) = 7637.94, p<.001. However, according to
Byrne (2001), the chi-square test is now largely regarded as sample size dependent. The
Normed fit index (NFI) and the comparative fit index (CFI) were estimated as fit indices
(Byrne, 2001). For adequate fit, the NFI and CFI should be greater than .90 (Bentler &
Bonnet, 1980). By this criterion, the initial model was adequate, with a NFI of .90 and a
CFI of .91. Finally, the root mean square error of approximation (RMSEA) was also used
as a fit index for the initial model. An RMSEA of .10 here indicated borderline fit
(Browne & Cudeck, 1993). A representation of the model is presented below in Figure 1.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
153

e29 e30 e31 e32


e1 Q94E
e2 Q52A Q62F Q62G Q62U Q62N
e3 Q53K .67 .78
.36 .68
e4 Q53I .32 .41
.62
e5 Q53J .71 Satisfaction
.85 with Working
e6 Q53G Conditions
.64
e7 Q53F .79
.55 Satisfaction
e8 Q53E .56 with Rewards
.45 e43 e44
e9 Q53D .41 e45
.18
e10 Q62M .40
.45 -.02
.41 Q47B Q47C Status
e11 Q62L e46
e12 Q73F .81 .78 .08
e13 Q52G e33 Q60H .79
.88 Global Job Career .83 Continuation
e14 Q52F e34 Q60I Intentions Behavior
Satisfaction
e35 Q52H .73

.80 e48
e15 Q54E .58
e16 Q53B .43
.73
e17 Q62D Satisfaction
with Work .70 .35 .34
e18 Q60F .50
e19 Q62I .73 .24
e20 Q53H .36

e47

e21 Q62V Organizational


Committment
e22 Q52Q .71 .66 .73 .90
.64 .82 .80 .87
.62
e23 Q65D .87
Q50A Q50B Q50D Q50E Q50F Q50G Q50H
.47 Satisfaction
e24 Q64D
.52 with Other
e25 Q64B .89 Agents
e36 e37 e38 e39 e40 e41 e42
.50
e26 Q65B .94
e27 Q64E
e28 Q65E

Figure 1. Exploratory Model

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
154

DISCUSSION

This model provides an adequate fit to Navy data for use in relating job
satisfaction and organizational commitment to career intentions and continuation
behavior. Advantages over previously tested models include the use of structural equation
modeling over regression path analysis, and the treatment of job satisfaction and
organizational commitment as separate factors. Several points of interest are apparent in
evaluating the results of the model. First, several factors and observed variables
contributed to global job satisfaction. Satisfaction with work predicted the most variance
in global job satisfaction of any of the factors (path weight = .80). Satisfaction with other
agents was the next largest predictor of global job satisfaction, followed by working
conditions and satisfaction with rewards. Interestingly, the amount of variance in global
job satisfaction predicted by satisfaction with rewards was very low (path weight = -.02).
This suggests that the rewards listed on this survey are not as important to job satisfaction
as being generally satisfied with the job itself, or else these rewards do not adequately
capture what Sailors value when considering satisfaction with their job. Perhaps these
results may also indicate the differences between intrinsic and extrinsic rewards as
predictors of job satisfaction. The relationships between variables relating to intrinsic and
extrinsic motivation should be explored further in this model as they pertain to job
satisfaction.

Job satisfaction as it is modeled here is a good predictor of affective


organizational commitment. The path weight from job satisfaction to organizational
commitment is .70 for the exploratory model. Adding a path from global job satisfaction
to career intentions did not add any predictive value to the structural equation model.
Here, organizational commitment mediates the relationship between job satisfaction and
career intentions/continuation behaviors. Organizational commitment predicted both
career intentions and continuation behaviors adequately in the model. Since the model
did not explain all of the variation present (as evidenced by the significant chi-square
statistic), this difference could be the result of an unknown third variable that is
influencing this relationship. This problem should be explored more in the future.

The more the Navy understands regarding Sailor behavior, the more change can
be implemented to improve the Navy. The results of this study suggest that job
satisfaction is a primary predictor of organizational commitment and that both play an
important role in predicting both career intentions and actual continuation behavior. In
addition, the results of this paper suggest that career intentions are actually stronger in
predicting continuation behavior than organizational commitment when evaluating them
in the context of all of the other variables in the model. More research is needed to fully
understand these relationships, and the specific contributions to job satisfaction that can
be implemented in the Navy. A validation of this model should be conducted in the future
to verify these relationships. However, it is clear at this point that understanding Sailor
continuation behavior would be incomplete without measurement of job satisfaction,
organizational commitment, and career intentions.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
155

REFERENCES

Allen, N. J., & Meyer, J. P. (1990). The measurement and antecedents of affective,
continuance, and normative commitment to the organization. Journal of
Occupational Psychology, 63, 1-18.
Arbuckle, J. L., & Wothke, W. (1999). Amos 4.0 user’s guide. Chicago, IL: SmallWaters
Corporation.
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the
analysis of covariance structures. Psychological Bulletin, 88, 588-606.
Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for
covariance structures. Multivariate Behavioral Research, 24, 445-455.
Byrne, B. M. (2001). Structural equations modeling with AMOS: Basic concepts,
applications and programming. Mahwah, New Jersey: Laurence Earlbaum
Associates, Publishes.
George, J. M., & Jones, G. R. (2002). Organizational behavior (3rd ed.). New Jersey:
Prentice Hall.
Jaros, S. (1997). An assessment of Meyer and Allen’s (1991) three-component model of
organizational commitment and turnover intentions. Journal of Vocational
Behavior, 51, 319-337.
Kerce, E. W. (1995). Quality of life in the U.S. Marine Corps. (NPRDC TR-95-4). San
Diego: Navy Personnel Research and Development Center.
Locke, E. A. (1976). The nature and causes of job satisfaction. In M. D. Dunnete (Eds.),
Handbook of Industrial and Organizational Psychology (pp. 1297-1349). New
York: John Wiley & Sons.
Olmsted, M. G., & Farmer, W. L. (2002, April). A non-multiplicative model of Sailor job
satisfaction. Paper presented at the annual meeting of the Society for Industrial &
Organizational Psychology, Toronto, Canada.
SPSS, Inc. (1999). SPSS 10.0 syntax reference guide. Chicago, IL: SPSS, Inc.
Staples, D. S., & Higgins, C. A. (1998). A study of impact of factor importance
weightings on job satisfaction measures. Journal of Business and Psychology,
13(2), 211-232.
Visser, D. (2001, January 1-2). Navy battling to retain sailors in face of private sector’s
allure. Stars and Stripes. Retrieved March 3, 2003. http://www.pstripes.com/
jan01/ed010101a.html
Withers, P. (2001, July). Retention strategies that respond to worker values. Workforce.
Retrieved September 24, 2003. http://www.findarticles.com/cf_0/m0FXS/7_80/
76938893/print.jhtml

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
156

Does Military Personnel Job Performance in a Digitized Future Force Require


Changes in the ASVAB: A comparison of a dynamic/interactive computerized
test battery with the ASVAB in predicting training and job performance among
airmen and sailors.
Ray Morath, Brian Cronin, & Mike Heil,
Caliber Associates

In the late 1990’s a team or researchers developed, as part of an award-winning project (SIOP Scott-Meyers
Professional Practices Award, 1999), a battery of dynamic, interactive/computerized tests that is currently
being used to select air traffic controllers; This battery of tests is known as the air traffic selection and
training (AT-SAT). An empirical validation study using a concurrent sample of over 1,000 job incumbents
found that the AT-SAT to have possess high predictive validity as well as high face validity. The AT-SAT
was designed to measure the cognitive, perceptual, and psychomotor abilities critical to the air traffic control
job, however, it has been found that many of these same cognitive, perceptual, and psychomotor abilities
have also been identified as important for military officer, non-commissioned office, and enlisted personnel
performance (Campbell, Knapp, & Heffner, 2002; Horey, Cronin, Morath, Franks, Cassella, & Fallesen, in
press; Noble & Fallesen, 2000; Rumsey, 1995).

Our team of researchers has recently created a parallel form of the original AT-SAT and is conducting an
equating study to ensure that the new form is of equivalent difficulty and complexity and measures the same
performance domains as the original battery. This equating study involves collecting data from
approximately 1,500 Air Force and Navy personnel who have recently completed boot camp and are about to
enter technical training programs for their assigned MOS. Our study will compare airmen and sailor scores
on the AT-SAT battery to their scores on the ASVAB and will also investigate the ability of the AT-SAT to
predict variability in training and job performance that is unique from that already predicted by the ASVAB.

Our paper will present the research methodology for creating and validating the AT-SAT and will present
data regarding the correlations of the various sub-tests with multiple dimensions of air traffic controller
performance. We will also present data linking the AT-SAT to individual cognitive, perceptual, and
psychomotor abilities required by air traffic controllers and discuss how many of these abilities are required
not only within various technical jobs, but also across officers, NCOs, and enlisted personnel. Finally, we
will discuss the influence of digitization on military performance requirements—specifically within the areas
of technical training performance and tactical and technical performance, and the need for new, dynamic and
interactive methods, such as the AT-SAT, of measuring the abilities associated with these changing
requirements.

Campbell, R. C., Knapp, D. J., & Heffner, S. T. (2002). Selection for leadership: Transforming NCO
promotion. (ARI Special Report 52). Alexandria, VA: US Army Research Institute for the Behavioral and
Social Sciences.

Horey, J., Cronin, B., Morath, R., Franks, W., Cassella, R., & Fallesen, J. (in press). Army Training and
Leader Development Panel Consolidation Phase: U.S. Army Future Leadership Requirement Study.
Prepared for the U.S. Army Research Institute under contract (DASW01-98D0049). Caliber Associates:
Fairfax, VA.

Noble, S. A., & Fallesen, J. J. (2000). Identifying conceptual skills of future battle commanders. (ARI
Technical Report 1099) Alexandria, VA: US Army Research Institute for the Behavioral and Social
Sciences.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
157

Rumsey, M.G. (1995). The best they can be: Tomorrow’s Soldiers. In R.L. Phillips& M.R. Thurman (Eds.),
Future Soldiers and the quality imperative: The Army 2010 conference. (pp.123-157). Fort Knox, KY: US
Army Recruiting Command.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
158

COMPARISONS OF SATISFACTION AND RETENTION MEASURES


FROM 1999-2003
Brian M. Lappin, Regan M. Klein, Lee M. Howell, and Rachel N. Lipari
Defense Manpower Data Center
1600 Wilson Blvd., Suite 400
Arlington, VA 22209-2953
lappinbm@osd.pentagon.mil

Introduction

Like many organizations, the U.S. Department of Defense (DoD) is interested in


employee satisfaction and turnover intentions. Although satisfaction and retention are distinct
concepts, they are closely related in that higher satisfaction with characteristics of work life is
associated with stronger organizational commitment (Ting, 1997; Mathieu, 1991). Therefore,
maintaining a high level of satisfaction with the military way of life can be a key element in the
retention of Service members.

Observing the fluctuation in levels of satisfaction and retention over time provides the
military with insight to how its current programs, services, and policies affect the quality of life
of its personnel. Using data from surveys conducted by Defense Manpower Data Center
(DMDC) from 1999 to 2003, this paper will analyze Service members’ level of satisfaction and
retention intentions. The 1999 Active Duty Survey (ADS) continued a line of research begun in
1969 with a series of small-scale surveys administered approximately every 2 years. These
surveys were expanded in 1978 to provide policymakers with information about the total
population directly involved with active duty military life (Doering, Grissmer, Hawes, and
Hutzler, 1981). Like its predecessors, the 1999 ADS provided timely, policy-relevant
information about the military life cycle (Wright, Williams, & Willis, 2000). The 1999 ADS
was a large-scale, paper-and-pencil survey which included items on attitudes toward service and
retention intentions. Since the 1999 ADS administration, Web-based surveys (Status of Forces
Surveys (SOFS) of Active-Duty Members) were conducted in July 2002 and March 2003. These
surveys also included measures of satisfaction and the likelihood to remain in service.

This paper will analyze differences in satisfaction levels and retention intentions by
military Service. Aldridge, Sturdivant, Smith, Lago, and Maxfield (1997) found differences
among Service groups in levels of satisfaction with specific components of military life. For
example, Navy, Marine Corps, and Air Force officers reported higher levels of satisfaction with
military life than did Army officers. Furthermore, Air Force enlisted members were more
satisfied with military life than were Army enlisted members (Aldridge et al., 1997).

In addition, this paper will analyze differences in satisfaction levels and retention
intentions by military paygrade. Paygrade has accounted for some differences in levels of
member satisfaction with military life (GAO, 1999; Norris, Lockman, and Maxfield, 1997). In
general, higher paygrade groups reported higher levels of satisfaction with military life. GAO

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
159

(1999) found that among members in retention critical specialties, more enlisted members than
officers were dissatisfied with the military.

Although most demographic characteristics were not strong predictors of retention


intention in 1985, and were not significant in 1992, Norris et al. (1997) found that, in 1985,
paygrade was the major demographic predictor of retention intention. Specifically, higher
paygrades were positively associated with lower reenlistment intentions among enlisted
personnel and higher retention intentions among officers. However, Wong (2000) notes the
possibility of generational differences in attitudes towards serving in the military that could
impact retention intentions. Loosely defined, one might expect paygrade groups E7-E9 and O4-
O6 to represent Baby Boomers, and paygrade groups below these to represent Generation X
members. Differences between collective experiences of the Baby Boom Generation and those
of Generation X have resulted in different attitudes toward work ethics (Wong, 2000).

In presenting survey results from 1999 to 2003, this paper will also discuss the relevance
of the economy and September 11th terrorist attacks on satisfaction and retention in the military.
As the United States transitioned from peacetime to wartime during this period, the military’s
role shifted from an already accelerated schedule of training and peacekeeping operations to
heavy involvement in world affairs and very high rates of activity. Assessing changes in
satisfaction and retention during this timeframe may provide insight into the stability of military
personnel’s intentions during this changing historical context.

Methods

In the analysis of satisfaction and retention across this 4-year period, three different
surveys are utilized. The 2002-2003 surveys are part of the Human Resources Strategic
Assessment Program (HRSAP) which consists of both Web-based and traditional paper-and-
pencil surveys that assess the attitudes and opinions of the entire DoD community—active,
Reserve, DoD civilian, and family members. Whereas the 1999 ADS employed a paper-and-
pencil administration method, both the July 2002 and March 2003 SOFS were Web-only
surveys.

Each of the three surveys targeted similar populations. The population of interest for the
1999 ADS consisted of all Army, Navy, Marine Corps, Air Force, and Coast Guard active-duty
members (including Reservists on active duty) below the rank of admiral or general, with at least
6 months of service when surveys were first mailed. Similarly, the population of inferential
interest for the July 2002 and March 2003 SOFS consisted of active-duty members of the Army,
Navy, Marine Corps, and Air Force, who had at least 6 months of service and were below flag
rank when the sample was drawn, and those who were not National Guard or Reserve members
in active-duty programs. Coast Guard members and Reserve component members in full-time
active duty programs were excluded from the 1999 ADS data prior to analyses for this report in
order to maximize comparability between the surveys.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
160

For all three surveys, single-stage, nonproportional stratified random-sampling


procedures were used to ensure adequate sample sizes for the reporting categories. The initial
sample for the 1999 ADS consisted of 66,040 individuals drawn from the sample frame
constructed from DMDC’s May 1999 Active-Duty Master Edit File. The survey was distributed
between August 1999 to January 2000. Completed surveys were received from 33,189 eligible
military members. The overall weighted response rate for eligible members, corrected for
nonproportional sampling, was 51%.

The initial sample for the July 2002 SOFS consisted of 37,918 individuals drawn from
the sample frame constructed from DMDC’s December 2001 Active-Duty Master Edit File. The
July 2002 SOFS was conducted July 8 to August 13, 2002. Completed surveys were received
from 11,060 eligible members, yielding an overall weighted response rate, corrected for
nonproportional sampling, of 32%.

The initial sample for the March 2003 SOFS consisted of 34,929 individuals drawn from
the sample frame constructed from DMDC’s August 2002 Active-Duty Master Edit File. The
March 2003 SOFS was conducted March 10 to April 21, 2003. Completed surveys were
received from 10,828 eligible respondents. The overall weighted response rate for eligible
members, corrected for nonproportional sampling, was 35%.

Data from all three surveys were weighted to reflect the population of interest. These
weights reflect (1) the probability of selection, (2) a nonresponse adjustment factor to minimize
bias arising from differential response rates among demographic subgroups, and (3) a
poststratification factor to force the response-adjusted weights to sum to the counts of the target
population as of the month the sample was drawn and to provide additional nonresponse
adjustments.

The 1999 ADS was an omnibus personnel survey covering such topics as military
assignments, retention issues, personal and military background, preparedness, mobilizations and
deployments, family composition, use of military programs and services, housing, perceptions of
military life, family and childcare concerns, spouse employment, financial information, and other
quality of life issues. In comparison, the July 2002 and March 2003 SOFS were somewhat
shorter surveys. Although the content of the three surveys was not identical, each included
questions pertaining to attitudes and behaviors, and all three surveys included questions
concerning Service members’ overall satisfaction with the military way of life and retention
intentions.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
161

Results

Satisfaction with the Military Way of Life

All three surveys asked Service members how satisfied they were with the military way
of life. For the purposes of this paper, the five response categories were collapsed into three
categories: satisfied, neither satisfied nor dissatisfied, and dissatisfied.

From 1999 to 2003, satisfaction with the military way of life increased 18-percentage
points—with the largest increase occurring between 1999 and 2002. In 1999, more Service
members indicated they were satisfied with the military way of life (49%) than indicated they
were dissatisfied (29%). In 2002, the percentage of members that indicated they were satisfied
increased to 61% and the percentage of members that were dissatisfied decreased to 20%.
Similarly, in 2003, the percentage of Service members reporting they were satisfied (67%)
increased, while the percentage of members that were dissatisfied (16%) decreased.

Across the three surveys, a higher proportion of Air Force members indicated they were
satisfied with the military way of life than did members of other Services. The percentage of Air
Force members that were satisfied increased from 1999 (56% vs. 45-49%) to 2002 (68% vs. 54-
61%), and again in 2003 (74% vs. 61-69%). From 1999 to 2003, the percentage of members
who were satisfied increased in all of the Services, with the percentages for Navy members
increasing (from 45% to 61% to 69%) such that Navy members were almost as satisfied as Air
Force members in 2003 (69% vs. 74%).

Satisfaction with the military way of life tended to increase with rank across the three
surveys. In 1999, fewer junior enlisted members (E1-E4) were satisfied than members in other
paygrade groups (36% vs. 54-72%). This finding held true in 2002 (46% vs. 68-85%) and in
2003 (53% vs. 74-87%). Similarly, across the three surveys, senior officers (O4-O6) were the
most satisfied with the military way of life and the percentage reporting satisfied increased
across the three years (72% to 85% to 87%).

Retention Intentions

Three measures of retention intentions were included in each of the three surveys. First,
members were asked to indicate their willingness to stay on active duty. Next, members were
asked if they intended to remain in the military for 20 years, a full career. For the purposes of
this paper, the five response categories for likelihood were collapsed into three categories: likely,
neither likely nor unlikely, and unlikely. Finally, members were asked if their spouse, girlfriend,
or boyfriend favored their remaining in the military. The five response categories for this
question were also collapsed into three categories: favors staying, has no opinion one way or the
other, and favors leaving.

From 1999 to 2003, likelihood to stay on active duty increased 11-percentage points. In
1999, more Service members indicated they were likely to stay on active duty (50%) than said
they were unlikely to stay (36%). In 2002, the percentage of members that indicated they were

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
162

likely to stay increased to 58%. Meanwhile, the percentage of members that were unlikely to
stay decreased to 26%. In 2003, the percentage of Service members that were likely to stay
(61%) increased from the previous year. However, the percentage of members that were
unlikely to stay (27%) remained roughly the same.

Across the three surveys, fewer Marine Corps members indicated they were likely to stay
on active duty. However, percentages for Marine Corps members indicating that they were
likely to stay did increase from 1999 (42% vs. 48-56%) to 2002 (46% vs. 58-63%), and again in
2003 (53% vs. 59-65%). More Air Force members indicated they were likely to stay than
members of other Services across the three surveys, but this difference was significant only in
1999 (56% vs. 42-50%).

In each of the three surveys, fewer junior enlisted members responded that they were
likely to stay on active duty than other paygrade groups. Although fewer junior enlisted
members indicated they were likely to stay, percentages did increase from 1999 (32% vs. 53-
72%) to 2002 (41% vs. 67-78%), and again in 2003 (46% vs. 63-80%). In 1999, more senior
officers were likely to stay on active duty than members of other paygrade groups (72% vs. 32-
66%). Likewise, more senior officers were likely to stay than other paygrade groups in 2002
(78% vs. 41-69%) and 2003 (80% vs. 46-72%), with the exception of warrant officers (W1-W5)
(73% and 79%, respectively).

From 1999 to 2003, likelihood to stay for 20 years increased 10-percentage points. In
1999, more Service members indicated they were likely to stay on active duty for at least 20
years (51%) than said they were unlikely to stay (36%). In 2002, the percentage of members that
indicated they were likely to stay for 20 years increased to 59% and the percentage of members
that were unlikely to stay decreased to 28%. In 2003, the percentage of Service members that
were likely to stay for 20 years (61%) increased and the percentage of members that were
unlikely to stay (28%) remained constant.

Across the three surveys, a lower proportion of Marine Corps members indicated they
were likely to remain in the military for at least 20 years. However, the percentage of Marine
Corps members responding that they were likely to stay for 20 years did increase from 1999
(43% vs. 49-58%) to 2002 (49% vs. 59-65%), and again in 2003 (52% vs. 59-66%). More Air
Force members indicated they were likely to stay in the military for 20 years than members of
other Services across the three surveys, but this difference was significant only in 1999 (58% vs.
43-51%).

Fewer junior enlisted members responded that they were likely to stay for at least 20
years than other paygrade groups across the three surveys. Although fewer junior enlisted
indicated they were likely to stay for 20 years, percentages did increase from 1999 (26% vs. 51-
87%) to 2002 (37% vs. 64-92%), and again in 2003 (40% vs. 60-91%). In 1999, more senior
officers (87%) were likely to stay for 20 years than members of other paygrade groups (26-83%).
Furthermore, more senior officers were likely to stay for 20 years than other paygrade groups in
2002 (92% vs. 37-76%) and 2003 (91% vs. 40-79%), with the exception of warrant officers
(91% and 89%, respectively).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
163

Spouse/significant other support to stay on active duty increased 8-percentage points


from 1999 to 2002 (44% vs. 52%), but decreased 6-percentage points from 2002 to 2003 (52%
vs. 46%). Meanwhile, the percentage of spouses/significant others that favored leaving active
duty decreased 7-percentage points from 1999 to 2002 (40% vs. 33%), but increased 3-
percentage points from 2002 to 2003 (33% vs. 36%).

In 1999, fewer Marine Corps members (37% vs. 43-48%) indicated that their
spouse/significant other favored staying on active duty, whereas Air Force members (48% vs.
37-43%) were the most likely to indicate they had spouse/significant other support for staying.
Again, in 2002, fewer Marine Corps members indicated that their spouse/significant other was
supportive of their remaining on active duty (44% vs. 52-56%). However, in 2003, there were
no significant differences among the Services.

Across the three surveys, fewer junior enlisted members indicated that their
spouse/significant other favored staying on active duty than other paygrade groups. The
percentage indicating that their spouse/significant other favored staying increased from 1999
(27% vs. 42-57%) to 2002 (35% vs. 57-67%), but decreased from 2002 to 2003 (30% vs. 47-
60%). In both 1999 (42%) and 2003 (47%), fewer junior officers indicated that their
spouse/significant other favored staying on active duty than other paygrade groups in that year,
excluding junior enlisted.

Correlations of Satisfaction and Retention Measures

As Table 1 shows, strong correlations were found across the three surveys between
member satisfaction with military life and intention to remain in the military. Correlations of the
greatest magnitude were between likelihood of staying, staying for 20 years, and
spouse/significant other support.

Table 1. Correlation Matrix: Satisfaction and Retention Indices from 1999-2003


Likelihood of Staying Spouse/Significant
Likelihood of Staying
for 20 Years Other Support
1999 2002 2003 1999 2002 2003 1999 2002 2003
Overall Satisfaction .55 .55 .53 .51 .54 .51 .44 .46 .39
Likelihood of Staying — — — .75 .75 .80 .66 .61 .56
Likelihood of Staying
— — — — — — .57 .53 .51
for 20 Years
Note: Correlations significant at the p < .0001 level

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
164

However, not everyone who is satisfied intends to stay and not everyone who is
dissatisfied intends to leave. While 70% of members who were satisfied with the military way of
life indicated they were likely to remain in the military in 1999, there were also 11% of members
who were dissatisfied that indicated an intention to stay. By 2002, the percentage of satisfied
members that intended to stay in the military rose to 81%, while the percentage of dissatisfied
members that intended to stay decreased to 5%. Percentages remained roughly the same in 2003,
with 83% of satisfied members, and 5% of dissatisfied members, indicating an intention to stay.

A similar pattern was seen across the three surveys when members were asked if they
intended to stay for at least 20 years. In 1999, 66% of satisfied members indicated an intention
to remain in the military for 20 years, whereas 14% of dissatisfied members intended to stay for
20 years. The gap widened in 2002, with 78% of satisfied members, and 8% of dissatisfied
members, wanting to stay for a full career. In 2003, 81% of satisfied members, and 6% of
dissatisfied members, indicated an intention to stay for 20 years.

Lastly, when asked about support in 1999, 68% of satisfied members indicated that their
spouse/significant other favored staying in the military, and 13% of dissatisfied members
indicated that their spouse/significant other favored staying in the military. As seen in the other
retention measures, the gap widened in 2002, with 79% of satisfied members, and 7% of
dissatisfied members, indicating that their spouse/significant other favored staying in the
military. In 2003, 83% of satisfied members, and 6% of dissatisfied members, indicated that
their spouse/significant other favored staying.

Conclusion

Between 1999 and 2003, the historical landscape was marked by the events of September
11, 2001, and the ensuing global war on terrorism. During this period, Service members’ overall
satisfaction with the military way of life, their likelihood to stay on active duty, and their
likelihood to remain in the military for a full career increased. Furthermore, the largest increases
in satisfaction and retention intentions occurred between 1999 and 2002. Increases in
satisfaction and retention indices may have been the result of a renewed sense of patriotism
following the terrorist attacks of September 11th. The downturn in the economy preceding
September 11th is another possible explanation for the increases in satisfaction and retention. It
is possible that, as the economy took a turn for the worse, the military became a more attractive
employment option. It is also interesting to note that, of the three points in time,
spouse/significant other support to stay on active duty peaked in 2002. The slight decrease in
support in 2003 may have been the result of spouses and significant others growing weary of
members’ time away from home.

Consistent with previous research (Aldridge et al., 1997), there were notable differences
amongst the Services. These differences largely reflect varying organizational structures across
the Services that are needed to support the Service-unique roles and missions (OASD 2002). For
example, across the three surveys, Air Force members were more satisfied and more likely to
remain in the military than members of the other Services. This trend reflects the emphasis on

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
165

quality of life in the Air Force, and their organizational structure, which is characterized by
higher retention rates creating an “older” force. In contrast, the Marine Corps does not have the
same organizational need (OASD 2002). The Marine Corps is a smaller force that retains fewer
members, particularly among enlisted members—the majority of its force. Although percentages
did improve over the 4-year period, fewer Marine Corps members indicated that they were likely
to stay on active duty than members of the other Services.

Fewer junior enlisted members were satisfied than members in other paygrade groups.
Although percentages did improve from 1999 to 2003, fewer junior enlisted members responded
that they were likely to stay on active duty than other paygrade groups. Also, fewer junior
enlisted members indicated that their spouse/significant other favored staying on active duty than
other paygrade groups. Senior officers were the most satisfied with the military way of life. In
addition, senior officers were more likely to stay on active duty than members of other paygrade
groups, with the exception of warrant officers in 2002 and 2003. These results are not surprising
as junior enlisted members are newer to their respective Services, and therefore, may have lower
levels of organizational commitment. Senior officers, in contrast, have been in service longer
and have more invested in the military as a career.

Satisfaction and retention remain important factors in sustaining a military organization.


As the military organization in the United States does not accommodate lateral entry into mid-
and senior-level paygrades, it is essential to retain the appropriate number of personnel at each
paygrade to ensure manpower and readiness requirements are met. In the post-September 11th
period of heightened personnel tempo, the survey results from 1999 to March 2003 indicate that
satisfaction and retention are stable, if not improving. However, given the current military
involvement in Iraq, it will be essential to continuously monitor the fluctuations in satisfaction
and retention intentions of military personnel.

References
Aldridge, D., Sturdivant, T., Smith, C., Lago, J., & Maxfield, B. (1997). The military as a
career: 1992 DoD Surveys of Officers and Enlisted Personnel and Their Spouses (Report No.
1997-006). Arlington, VA: DMDC.

Doering, Z. D., Grissmer, D. W., Hawes, J. A., & Hutzler, W. P. (1981). 1978 DoD Survey of
Officers and Enlisted Personnel: User’s manual and codebook (Rand Note N-1604-MRAL).
Santa Monica, CA: Rand.

General Accounting Office. (1999). Military Personnel: Perspectives of surveyed Service


members in retention critical specialties (GAO Report No. NSIAD-99-197BR). Washington,
DC: United States General Accounting Office.

Mathieu, J. E. (1991). A cross-level nonrecursive model of the antecedents of organizational


commitment and satisfaction. Journal of Applied Psychology, 76, 607-618.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
166

Norris, D. G., Lockman, R. F., & Maxfield, B. D. (1997). Understanding attitudes about the
military way of life: Analysis of longitudinal data from the 1985 and 1992 DOD surveys of
officers and enlisted personnel and military spouses (DMDC report No. 97-008). Arlington,
VA: DMDC.

Office of the Assistant Secretary of Defense (Force Management Policy) (2002). Population
representation in the military services fiscal year 2000. Washington, D.C.

Ting, Y. (1997). Determinants of job satisfaction of federal government employees. Public


Personnel Management, 26 (3), 313-334.

Wong, L. (2000). Generations apart: Xers and boomers in the officer corps. Carlisle, PA:
Strategic Studies Institute, U.S. Army War College.

Wright, L. C., Williams, K., & Willis, E. J. (2000). 1999 Survey of Active Duty Personnel:
Administration, datasets, and codebook (Report No. 2000-005). Arlington, VA: DMDC.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
167

BRITISH ARMY LEAVERS SURVEY:


AN INVESTIGATION OF RETENTION FACTORS
Johanna Richardson
Ministry of Defence: British Army,
Directorate Army Personnel Strategy (Science), Building 396a,
Trenchard Lines, Upavon, Wiltshire, UK.

The British Army has been operating a rolling programme of Continuous Attitude
Surveys (CASs) for the last twenty years. The surveys are a management information
tool to facilitate effective planning. The aim of the CAS programme is to obtain
information from a representative sample on aspects of Army life. Surveys are
administered to Serving Personnel (Regulars and the Territorial Army), Families of
Serving Personnel and to those who leave the Army. They aim to monitor aspects of
duty of care, factors affecting retention, satisfaction with the main areas of Army life,
and the impact of personnel policies. Data from the surveys provide information which
assists with personnel planning and manning strategy. The results are also used within
Army wide Performance Indicators (PIs) for the Adjutant General and the Executive
Committee of the Army Board. Analyses of the data are made available to the Armed
Forces Pay Review Board, and in the future may also be made available for use by the
Defence Management Board.

Retention is a key issue for the British Army, for both financial and operational reasons.
Factors influencing retention can be conceptualised as being either retention positive or
retention negative. Retention positive factors are those which impact upon an
individual’s intention to stay within the Army, while retention negative factors are those
that impact upon an individual’s intention to leave. Previous research has shown that if
attitudes can reliably be related to behavioural intentions, a more reliable prediction of
actual behaviour may be obtained. Data on factors influencing retention are obtained
primarily from two surveys, the Serving Personnel survey and also the Leavers survey.

The Serving Personnel survey is administered twice annually to a random sample of 4%


of all regular soldiers and 10% of all regular officers. The sample is stratified by rank to
ensure representation of relatively small groups of interest. Overall response rates are
typically within the region of between 40 and 55%, although the response rates vary
between officers and soldiers. Typically, around 65% of officers respond, compared to
approximately 40% of soldiers. The results in this paper include some items from a
recent wave of this survey (SP4).

The Leavers survey is administered on an ongoing basis to all regular serving personnel
who are leaving the Army. This includes cases where the individual has reached the end
of a term of engagement or commission, has applied for premature voluntary release, or
is being discharged for medical, administrative or disciplinary reasons. The Leavers
survey began with a pilot study that involved questionnaires being administered
between October 2001 to January 2002. Following this trial, administration continued
on an ongoing basis, and questionnaires in their current form have been issued to leavers
at their exit point since that time. Hence, this paper includes analyses from leavers’
questionnaires administered between January 2002 and October 2003.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
168

Of all personnel who leave the Army, those who apply for premature voluntary release
probably provide the most insight into retention issues. Army officers are able to PVR
with a six month notice period, and soldiers are currently required to give twelve
months notice. In some situations it is possible for an individual to leave the Army with
a lesser notice period, although this is determined on a case by case basis, and is
dependent on factors such as the operational commitments of the Army and any return
of service required to repay training investment. Statistics indicate that approximately
40% of those who apply for PVR later withdraw their application. Because the leavers’
questionnaires are anonymous, there is currently no way to determine whether any are
completed and returned by personnel who apply for PVR and subsequently change their
minds.

PVR levels in the Army are slightly higher than those in the other Armed Forces.
Annual PVR rates from the trained strength for the Army are 3.3% of officers, and 5.2%
of soldiers. Within the Royal Navy/Royal Marines, the annual PVR rate for officers is
2.5% and 5.4% for ratings/other ranks, while within the RAF the figures are 2.1% and
4.0% respectively. These rates exclude PVR from training establishments, and are not
an indication of overall attrition. Of the 590 completed leavers’ questionnaires received
by DAPS Science between January 2002 and October 2003, 282 were from personnel
who had applied for PVR. 130 were from personnel who had reached the end of their
engagement/commission, and 44 were from personnel discharged for medical,
administrative or disciplinary reasons. The reason for leaving was missing data in 134
cases.

Of the 282 PVR personnel who returned a questionnaire during the period, 8.5% were
officers and 91.4% were soldiers. Of the total, 83.9% were male and 16.1% were
female. 76.2% were aged 30 or under at the time they completed the questionnaire. The
majority of soldiers were from the lower ranks: privates, lance corporal/bombardiers or
corporals/bombardiers. Unless soldiers buy themselves out, or are discharged for
medical reasons, soldiers serve a minimum of four years. Unsurprisingly therefore, most
of those who applied for PVR had served for between 4 and 7 years, which may explain
the large proportion of soldiers from the lower ranks. Overall, the majority of PVR
personnel had applied for PVR within six months of deciding to leave (37.2%),
compared to 29.4% who waited for between 7 and 12 months and 30.9% who had
waited for more than twelve months before handing in their notice.

In terms of the reasons given for applying to leave the British Army, the majority of
PVR cases said that the decision was related entirely or in part to the impact of the
Army on personal and/or domestic life (81.6%). 80.1% said that the decision was
related entirely, or in part, to general aspects of Army life. 79.1% said that the decision
to leave was related entirely, or in part, to job satisfaction, and 76.2% said that it was
related entirely, or in part, to factors outside the Army. These categories are not
mutually exclusive, and it therefore appears that the reasons why people apply for PVR
are many and varied, even for a single individual.

A number of statements were included within each of these four categories, and PVR
respondents were asked to state which had contributed to, or been critical in, their
decision to leave the Army. Over all four categories, PVR personnel said that the two
most important reasons for leaving the British Army were a feeling that civilian life

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
169

would be better, and that there would be better (civilian) employment opportunities
outside the Army. In third place was the belief that if the individual stayed in the British
Army any longer, if would be difficult to start a civilian career. This statement is
particularly interesting in light of the fact that the majority of PVR questionnaires were
returned by personnel aged 30 or under.

These top three reasons for taking PVR from the British Army are fairly similar to those
given in surveys conducted by the Royal Navy/Royal Marines and the Royal Air Force.
In the RN/RM, the top three reasons cited are the desire to live at home, the wish to take
up another career, and to marry and/or raise a family. In the RAF, the reasons are given
as a lack of family stability, career prospects outside the RAF, and the difficulty in
starting a second career if the individual stays in the service for any longer.

For the British Army, the fourth most frequently endorsed contributory or critical factor
in the decision to PVR was the statement there was too much separation from a spouse
or partner. Interestingly, this statement was cited as contributing or being critical to a
decision to leave among married and single personnel, and personnel in a long term
relationships. When responses related to the impact of the Army on personal and/or
domestic life were analysed in greater detail, an interesting pattern emerged. The top
two factors in this category were the same for married personnel, single personnel, and
those in a long term partnership. These were the degree of separation from a
spouse/partner, and the detrimental effects of Army life on the relationship. The third
most important factor for single personnel and those in a long term relationship was the
poor standard of single living accommodation (SLA). For married personnel, it was the
detrimental effect of Army life upon children.

Personnel expectations will certainly have a role in retention. For example, the Serving
Personnel survey asks, for those who joined the British Army within the last five years,
about the factors that most influenced their decision to join. Recruitment positive factors
include the opportunities for sport and an active life, and the opportunities for adventure
training. However, 59% of PVR leavers stated that a lack of opportunity for sporting
activities or adventurous training had contributed, or was critical, to their decision to
leave the British Army. Similarly, 43% of PVR leavers stated that an insufficient
amount/quality of training had contributed, or was critical, to their decision to leave the
Army. Clearly, providing a positive image of Army life is a key recruitment factor.
However, expectations need to be managed throughout an Army career to avoid
disappointment and enhance retention.

The surveys administered to Serving Personnel also provide valuable information on


retention factors. These data can be used to compare intentions to leave with Leavers
survey data on actual exit behaviour. Table 1 shows the factors that increase intention to
leave among serving personnel, and those that are important in actually deciding to
leave for those who PVR.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
170

Table 1: Factors influencing intentions to leave (among serving personnel) and exit
behaviour (among PVR leavers).

Ranked Serving Personnel: Factors Leavers (PVR): reasons for


reason increasing intention to leave leaving the British Army
1 Impact of Army lifestyle on Opportunities outside the Army
personal/domestic life
2 Operational commitments and Impact of Army lifestyle on
over-stretch personal/domestic life
3 Amount of extra duties Your own morale

4 Frequency of operational tours Management in your workplace

5 Accommodation Job satisfaction

The impact of the Army lifestyle on personal/domestic life is a key retention negative
factor for both groups. For the PVR leavers, job satisfaction, own morale and
management issues are key retention negative factors. However, for the serving
personnel, these are retention positive factors – these personnel identify other irritants as
key influences on their intentions to leave. Opportunities outside the Army appear to be
a key issue for PVR leavers, but it is not known whether this is a cause or an effect. The
policy implications of these differences are another issue. Does one concentrate on
alleviating the factors influencing intention to leave for serving personnel? Or is it
preferable to focus on remedying factors which are known to be associated with exit
behaviour? Perhaps the debate is theoretical, since the British Army would like to
provide a satisfying job and retention positive service conditions to all.

The analyses reported here are the first available from the British Army Leavers survey
since the pilot study was completed. Future plans for the survey include refinement of
the instrument, and achieving some consistency with the questions asked of serving
personnel. In addition, it must be acknowledged that there were administration issues
associated with the questionnaires included in the current analyses. Unfortunately, these
precluded calculation of an accurate and reliable response rate. Also, given that
approximately 40% of PVR applications are later withdrawn, it would be preferable to
be able to account for this within the data. DAPS Science is now addressing these
issues, in order that the exit data from leavers’ questionnaires can provide enhanced
information to military policy makers. This will assist in manpower planning and
personnel policy development.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
171

PREDICTORS OF U.S. ARMY CAPTAIN RETENTION DECISIONS

Debora Mitchell Ph.D., Heidi Keller-Glaze, Ph.D., and Annalisa Gramlich


Caliber Associates
Jon Fallesen, Ph.D.
Army Research Institute

ABSTRACT

In June 2000, at the direction of the U.S. Army Chief of Staff, the Army began the largest
assessment it has ever conducted on training and leader development. The research assessed
organizational culture, training and leader development, perceptions of advancement
opportunity, and the effect of these factors on retention. In all, approximately 13,500 leaders and
spouses provided their input during surveys, focus groups, and interviews. Data were collected
from lieutenants, captains, majors, lieutenant colonels, colonels, and NCOs.

To identify the variables that impact captains’ decision to leave the Army before
retirement, logistic regression was conducted with intent to stay or leave as the dependent
variable and demographic data and factors related to benefits, pay, self-development, mentoring,
performance evaluation, and training as independent variables. Results showed that length of
time as an officer, source of commissioning, gender of the respondent, benefits, mentoring, and
counseling were significant predictors of intent to leave. These results provide evidence of the
importance of professional development to retention.

INTRODUCTION

The Army Training and Leader Development Panel was initiated in June 2000, at the
direction of the U.S. Army Chief of Staff. The Panel’s charter was to review, assess, and provide
recommendations for the development and training of 21st Century leaders. The panel was made
up of Army researchers, subject matter experts, and officers who collected data on satisfaction
with training and leader development. They also collected data on a wide variety of related
topics, such as institutional and unit training, self-development, performance appraisal,
mentoring, selection and retention, satisfaction, commitment, and Army culture.

The panel’s major emphasis was on training and leader development. A meta-analysis by
Hom and Griffeth (1995) suggests that the quality of one’s management positively affects
satisfaction and retention (Hom & Griffeth, 1995). By targeting leader development, the Army
should be able to improve retention as well as improve Soldiers’ ability to meet mission
requirements.

The captain rank is a decision point for many Soldiers. Because of incentives provided
by the retirement system, if a Soldier decides to stay beyond the rank of captain, he or she

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
172

typically plans to stay until retirement. To better understand the relationship of training and
development and the retention of captains, a logistic regression analysis was conducted. This
paper describes the analysis and results.

METHOD

Officer and NCO data collectors were trained in conducting focus groups and
administering surveys. They were provided with a sampling plan, and asked to gather data from
a representative group of officers, warrant officers, and NCOs. They collected data from Army
personnel and spouses worldwide. Approximately 13,500 soldiers and spouses provided data
through focus groups, interviews, or surveys.

Participants in the Officer Comprehensive survey were 5,525 officers, warrant officers,
and NCOs, from both active and reserve components. Data collectors distributed the instrument
in June 2000 to groups of participants. Of the 1,548 captains who responded to the officer
survey, 1,296 provided complete information, and comprised the sample used in the logistic
regression described below.

Logistic regression is a type of regression analysis in which the dependent variable is


dichotomous. In this case, logistic regression estimates the odds of intent to stay in the Army
and the odds of intent to leave before retirement, based on a set of predictor variables. The result
of this analysis provides insight into the characteristics of captains who have indicated that they
are planning to leave prior to retirement.

Dependent variable. The dependent variable for this analysis was career intention. This
variable is intended to reflect whether the captain plans to leave the Army before retirement.
Career intention was determined by creating a dichotomous variable with 1=leaving and
0=staying, which was computed based on officers’ responses to two survey items.

Independent Variables. Both demographic and survey items were investigated as


possible predictors of officers’ intent to leave. The following demographic items were included:
months in current position, number of deployments, number of PCS moves, type of unit, gender,
rank, source of commission, ethnicity, career field, branch, functional area, highest echelon of
command, Combat Training Center (CTC) experience, and length of time as an officer.

A collinearity analysis suggested that rank, length of time as an officer, and number of
PCS moves were strongly correlated with one another. The model including the variable “length
of time as an officer” was selected as having the best fit of the three.

Due to the long list of potentially important demographic variables, a step-wise logistic
regression was conducted to determine which demographic variables were better predictors of
intent to leave than others. Those demographic variables that contributed the most to the model
were retained for the subsequent analyses. Retained demographic variables include: CTC
experience, years as an officer, source of commission, gender, and months in current position.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
173

Survey items were then aggregated into components based on principal components
analysis. The resulting components were used as predictor variables. The components included
in the logistic regression are as follows: satisfaction with leader development, expectations met
with regard to benefits, retention factors (service and job aspects), retention factors (pay,
benefits, quality of life), service ethic, performance orientation and understanding of service,
obstacles to self-development, individuals that aid in self-development, importance and
effectiveness of mentors, usefulness of counseling by rater, effectiveness of leadership skills
training, and quality of home station training. All variables, demographic and components, were
entered into the model simultaneously.

RESULTS

Results of the Homer and Lemeshow Test assess the goodness of fit of the model.
Results are significant at the p>.05 level (Χ2(8)=12.884, p<.12), indicating good fit.

According to the Cox and Snell and Nagelkerke’s R2 tests, the set of independent
variables accounts for 23% and 31% of the variance in intent to leave, respectively (see Table 1).

Table 1: Model Summary


Cox & Snell Nagelkerke R
-2 Log likelihood R Square Square
1449.56 0.23 0.31

Classification analysis summarizes the fit between actual and predicted group
memberships (Table 2). These statistics can be used to measure the accuracy of the logistic
regression analysis. The positive and negative predictive values denote what percentage of the
time the model’s predictions are likely to be true. The model correctly classifies 71% (449/630)
of those planning to leave and 71% (477/666) of those planning to stay.

Table 2: Classification Table


Predicted
Considering %
Observed Staying Leaving Correct
Staying 477 189 71.6
Considering Leaving 181 449 71.3
Overall % 71.5

Table 3 shows the variables in the equation. The significant predictors of intent to leave
are in bold.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
174

Table 3: Variables in the Equation


Variable Name B S.E. Wald df Sig. Exp(B)
CTC experience 0.132 0.14 0.884 1 0.347 1.141
Length of time as an officer -0.25 0.03 69.309 1 0.000 0.779
Source of commission (other) 80.426 3 0.000
Source of commission (ROTC) 0.462 0.317 2.126 1 0.145 1.588
Source of commission (Military Academy) 1.456 0.354 16.94 1 0.000 4.288
Source of commission (officer candidate) -1.024 0.378 7.356 1 0.007 0.359
Sex of respondent (male) -0.766 0.213 12.998 1 0.000 0.465
Months in current position 0.012 0.008 2.162 1 0.141 1.013
Leader development -0.107 0.112 0.903 1 0.342 0.899
Expectations met with benefits -0.803 0.131 37.664 1 0.000 0.448
Retention factors (service and job aspects) -0.107 0.13 0.683 1 0.408 0.898
Retention factors (pay, benefits, quality of
life) -0.091 0.096 0.898 1 0.343 0.913
Service ethic 0.058 0.089 0.416 1 0.519 1.059
Performance orientation and understanding
of service 0.097 0.097 0.997 1 0.318 1.102
Obstacles to self-development 0.064 0.092 0.495 1 0.482 1.067
Individuals that in self-development -0.018 0.108 0.028 1 0.868 0.982
Importance and effectiveness of mentors -0.229 0.095 5.78 1 0.016 0.795
Rater provides useful counseling -0.198 0.082 5.893 1 0.015 0.82
Effectiveness of leadership skills training 0.118 0.075 2.501 1 0.114 1.126
Quality of home station training -0.031 0.094 0.109 1 0.741 0.969
Constant 5.172 0.774 44.697 1 0.000 176.26

The results indicate that captains who have been an officer longer are less likely to
consider leaving than those who have been an officer for a shorter period of time. Captains
whose source of commission or appointment was the military academy are 4.3 times more likely
to be planning to leave than those who were commissioned or appointed some other way (e.g.,
ROTC and Officer Candidate School). In addition, men are less likely to plan to leave than
women. Those who indicate that their expectations have been met with respect to benefits are
less likely to plan to leave. Also, those captains who have had effective mentors or mentoring
experiences are less likely to plan to leave. Finally, those captains who indicate their raters
provide useful mentoring and counseling are less likely to consider leaving the Army.

A follow-up analysis of other survey items was conducted to gain more insight. Captains
who reported that they were planning to leave before retirement at 20 years answered a set of
questions about the importance of various issues in their decision to leave. The top issues were:
a belief that the Army no longer demonstrates that it is committed to captains as much as it

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
175

expects them to be committed, they are not having fun anymore, a perception of excessive
micro-management, they do not have a sense of control or self-determination in their career, and
they can no longer balance the needs of the Army with the needs of their families.

DISCUSSION

The following variables predicted captain intent to leave the Army: length of time in the
Army, source of commissioning, gender, benefits satisfaction, the perception of effectiveness of
counseling received, and having a mentor. Those captains who have been an officer longer are
less likely to consider leaving than those who have been an officer for a shorter period of time.
This finding is expected, for at least two reasons, according to Mackin, Hogan, and Mairs (1993).
First, the Army’s retirement system provides an increasingly greater incentive to stay for 20
years. Second, self-selection is occurring; Soldiers who are more suited for the Army life are
more likely to stay at each decision point. (Mackin, Hogan, & Mairs, 1993).

Captains whose source of commission was through a military academy are more likely to
plan to leave than those who were commissioned some other way. This may be due in part to a
disconnect between what is taught in the military academies and what actually goes on in the
field. Also, captains who have had a highly selective military academy education may have
more opportunities in the private sector. Those whose source of commission was Officer
Candidate School are less likely to plan to leave than others. These captains moved up through
the NCO ranks and therefore have a considerable amount of experience in the Army and a
realistic job preview. Furthermore, they most likely would not have attended OCS if they were
not intending to make the Army a career.

Gender is also a significant predictor of career intentions with men as less likely to plan
to leave than women. Work-family balance was one of the top reasons captains said they were
planning to leave. In a related study, time separated from family was the top reason officers
reported thinking about leaving or planning to leave before retirement (ARI, 2002). Family-
related issues are important to both men and women; however, women may find it harder to
balance the needs of their families with those at work. This issue needs more research.

Those who indicate that their expectations have been met with respect to benefits are less
likely to plan on leaving. This finding corresponds to the top reason for planning to leave, which
suggests that there is a perceived imbalance between the commitment of the individual to the
Army and the Army’s commitment to the individual. Improving benefits or better
communication of the value of the existing benefits may help to improve perceptions of benefits.

Also, those captains who have had effective mentors or mentoring experiences are less
likely to consider leaving as well as those captains who indicate their raters provide useful
mentoring and counseling. The significance of these variables may be that both mentoring and
counseling provide important feedback to the individual as well as guidance on professional
development and career advice. In addition, mentoring and counseling provide a one-on-one

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
176

connection to a very large organization and can reflect an investment by the organization in the
individual.

In conclusion, this paper provides results of an analysis of captain retention. The results
can lead to a better understanding of the factors that affect retention, as well as suggest some
improvements that the Army can make to attempt to improve retention.

REFERENCES

Army Research Institute for the Behavioral and Social Sciences (August, 2002). Reasons
for Leaving the Army Before Retirement, Survey Report (Report No. 2002-13).

Hom, P. W. & Griffeth, R. W. (1995). Employee Turnover. Cincinnati, OH: South-


Western College Publishing.

Mackin, P. C., Hogan, P. F., Mairs, L. S. (1993). A Multiperiod Model of US Army


Officer Retention Decisions. (ARI Technical Report 93-03). Alexandria, VA: US Army
Research Institute for the Behavioral and Social Sciences.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
177

THE IMPORTANCE OF A FAMILY-FRIENDLY WORK ENVIRONMENT


FOR INCREASING EMPLOYEE PERFORMANCE AND RETENTION
Ann H. Huffman and Satoris S. Youngcourt
Texas A&M University
College Station, TX 77843-4235
annhuffman@tamu.edu
Carl Andrew Castro
Division of Neuropsychiatry
Walter Reed Army Institute of Research

ABSTRACT
This study tested perceptions of family-friendly work environment as a moderator of the
relationship between work-life conflict and job performance. Survey data and actual performance
measures from 230 US Army soldiers were examined. Findings indicated a perceived family-
friendly work environment was negatively related to intentions to leave the organization, and
positively related to individual and future performance. Furthermore, although employees who
had more family responsibilities benefited from the family-friendly work environment, there was
no apparent adverse impact on single, childless individuals. The results underscore the
importance of family-friendly work environments to facilitate employee performance and career
decisions.

Authors’ Notes.
The views expressed in this paper are those of the authors and do not necessarily represent the
official policy or position of the Department of Defense (paragraph 4-3, AR 360-5) or the U.S.
Army Medical Command.
The findings described in this paper were collected under WRAIR Research Protocol #700
entitled “A Human Dimensions Assessment of the Impact of OPTEMPO on the Forward-
Deployed Soldier” under the direction of C.A. Castro (1998). The authors thank Amy B. Adler,
Robert Bienvenu, Jeffrey Thomas and Carol Dolan, co-investigators on this research project. We
would also like to thank Millie Calhoun, Coleen Crouch, Alexandra Hanson, Tommy Jackson,
Rachel Prayner, Shelly Robertson and Angela Salvi for their excellent technical support. This
research was funded by the Research Area Directorate for Military Operational Medicine U.S.
Army Medical Research and Materiel Command in Ft. Detrick, Maryland and the U.S. Army,
Europe, Heidelberg, Germany.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
178

In recent years, researchers have been interested in the balance between employees’ work
and home lives (Kossek & Ozeki, 1998; Major, Klein, & Ehrhart, 2002). Ideally, individuals are
able to effectively manage the requirements of both roles without undue difficulty.
Unfortunately, work and life demands frequently clash, making it difficult for the individual to
be simultaneously successful in both domains, resulting in work-life conflict.
Work-life conflict has been linked to numerous negative consequences for the individual,
including lower general well-being (Aryee, 1992; Frone, 2000; Frone, Russell, & Cooper, 1992;
Thomas & Ganster, 1995), lower job satisfaction (Adams, King, & King, 1996); greater burnout
(Burke, 1988), and greater alcohol use and poor health (Allen, Herst, Bruck, & Sutton, 2000;
Frone, Russell, & Barnes, 1996). The organization also experiences negative consequences. For
example, researchers have suggested that conflict leads to negative organizational outcomes such
as increased turnover and decreased performance (e.g., Jex, 1998). Although researchers have
consistently demonstrated a link between work-life conflict and increased turnover intentions
(Burke, 1988; Greenhaus, Collins, Singh, & Parasuraman, 1997; Greenhaus, Parasuraman, &
Collins, 2001), few have empirically examined the relationship between work-life conflict and
job performance (Allen et al., 2000).
As the potential adverse effects of work-life conflict become more apparent,
organizations have become more proactive in their attempts to buffer the negative effects. One
way in which organizations have tried to assist employees is by fostering a family-friendly
culture to allow employees support and flexibility to successfully sustain both their work and
personal lives (Kossek & Lobel, 1996).
The current study examined whether a family-friendly work environment buffers the
negative relationship between work-life conflict and performance and organizational outcomes.
Specifically, we examined employee perceptions of a family-friendly work environment and how
these perceptions directly and indirectly related to subjective and objective measures of
performance. Additionally, we assessed whether perceptions of a family-friendly work
environment were beneficial for all employees regardless of family responsibilities, or if they
were detrimental to employees with few family responsibilities.

WORK-LIFE CONFLICT
There are numerous work/nonwork conflict constructs in the work and family literature
(e.g., work-life conflict, work-nonwork conflict, work-family conflict). Although many of the
constructs are similar, or are used interchangeably, there are some subtle differences. Work-life
conflict is based on a broader definition than the more specific construct work-family conflict.
Whereas work-family conflict focuses on time and strain due to family responsibilities, work-life
conflict encompasses family factors in addition to personal responsibilities not necessarily
related to families (e.g., shopping for personal items, exercising, spending time with friends). We
chose to operationalize role conflict specifically as work-life conflict for three reasons. First, the
more wide-ranging work-life conflict construct allows us to include both single and married
individuals. Second, researchers have advocated the use of more flexible and broader constructs,
such as work-life conflict, in work and nonwork role research (Behson, 2002; Grover & Crooker,
1995). Finally, although many have stated the two constructs are similar, scant research has
empirically tested the relationship between work and life roles (Frone, 2003).
According to role theory (Hart, 1999), all of the work-nonwork variables are similar.
Role theory asserts that strain will occur when individuals face competing demands from

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
179

multiple life roles (Kahn, Wolfe, Quinn, Snoek, & Rosenthal, 1964). Work-life conflict can be
conceptualized as a type of interrole conflict occurring when the pressures and demands of work
collide with the pressures and demands of one’s personal life (Kopelman, Greenhaus, &
Connolly, 1983). The strength of the conflict depends on the pressures or demands of both roles,
with more conflict occurring when the capacity of the individual to meet all the demands from
both roles is exceeded. With this in mind our literature base for the current study is based on both
specific and general work-life demands research and will use the terms interchangeably.

WORK-LIFE CONFLICT AND WORK-RELATED OUTCOMES


Measures of organizational outcomes can be “objective” or “subjective”. Objective
measures focus on discrete, countable behaviors or outcomes, such as scores on proficiency
exams or turnover rates, and are important because they give independently verifiable depictions
of employee performance. These measures, however, may be deficient indicators of performance
because they might fail to capture other important markers of performance. For example, many
job-relevant knowledge, skills, and abilities (e.g., leadership or communication skills) are not
easily assessed objectively. For these, other measures may be necessary.
Subjective measures focus on outcomes requiring individual perceptions, such as
supervisory ratings of performance, turnover intentions, and customer satisfaction reports.
Subjective ratings, although necessary to capture important aspects of most jobs, suffer from
various biases (Cascio, 1998). Self-ratings are often only moderately related to independent
measures of performance (e.g., Bommer, Johnson, Rich, Podsakoff, & Mackenzie, 1995; Katosh
& Traugott, 1981; Kleck, 1982), unless the self-ratings are based on data that can be readily
observed or verified (Spector, Dwyer, & Jex, 1988). Although researchers have consistently
shown that individuals are typically lenient with self-ratings (Harris & Schaubroeck, 1988;
Thornton, 1980), the bias is not always in favor of the individual. For example, Adler, Thomas,
and Castro (2002) found that, when asked to provide information on their performance,
individuals tended to portray themselves in a more negative light than the independent records
would indicate, calling into question the accuracy and validity of the so-called more reliable,
non-self-report measures.
Although intuitively related, objective and subjective measures of performance often
exhibit low correlations with one another (e.g., Bommer et al., 1995; Heneman, 1986), and
therefore when used alone (i.e., only subjective or only objective), results should be interpreted
with caution. That is, each data type provides unique information and together can provide a
more complete representation of performance. Therefore, in this study, both objective measures
(i.e., marksmanship scores and physical training scores) and subjective measures (i.e.,
perceptions of future combat performance and turnover intentions) of performance are used.
Individual Job Performance. Although job performance is one of the most relevant outcomes
to organizations, it is one of least studied in relation to work-life conflict (Frone, Yardley, &
Markel, 1997; Kossek & Ozeki, 1998; Perrewe, Treadway, & Hall, 2003). Of the few studies that
examine the work-life conflict and job performance relationship, the results are far from
equivocal, with some reporting a negative relationship (Aryee, 1992, Frone et al., 1997; Kossek
& Nichol, 1992) and some reporting no relationship (Greenhaus, Bedeian, & Mossholder, 1987;
Netemeyer, Boles, & McMurrian, 1996). While we found no published studies that reported a
positive relationship between work-life conflict and job performance, Greenhaus et al. (1987)
provided two arguments for why researchers might expect to find such a relationship. First, they
noted that conflict might exist for high performing individuals because they spend more time at

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
180

work than others, and therefore have less time for their personal lives. They proposed that, “the
very activities that produce high job performance, in other words, may estrange an employee
from his or her family and produce feelings of personal dissatisfaction” (p. 202). Second, they
suggested that there are particular behaviors required to attain a high level of performance that
are only appropriate in the work domain, and may in fact be detrimental in the family domain.
For example, employees may be required to conform to the values and norms of the
organization, or they may need to alienate themselves from a satisfying personal relationship in
order to be successful within an organization.
In their meta-analytic review, Allen et al. (2000) found an overall weighted mean
correlation of -.12 across all studies examining the relationship between work-life conflict and
job performance. This finding was based on only four samples, however, and therefore should be
interpreted with caution. Furthermore, these studies all had one common limitation in that the
measure of performance was solely based on subjective performance measures. For example,
whereas Aryee (1992) used three measures of work-family conflict (job-parent conflict, job-
spouse conflict, and job-homemaker conflict), he assessed performance with a single self-report
measure of work quality. Similarly, Frone et al. (1997) assessed performance with a self-report
measure that tapped work role behaviors. Although Netemeyer et al. (1996) used real estate sales
as their performance measure, they nevertheless depended on normative self-report ratings.
Indeed, the only sample in the Allen et al. meta-analysis that did not rely on self-ratings was
Greenhaus et al.’s (1987) study, which used supervisor ratings.
Despite the rationale that has been provided for positive relationships between work-life
conflict and job performance (see Greenhaus et al., 1987), we suspect a negative relationship will
exist, based primarily on the generally negative effects of interrole conflict. That is, we propose
that individuals who report greater levels of work-life conflict will have decreased job
performance, as indicated by objective measures, because of the strain inherent in the conflict.
Specifically, we propose individuals reporting greater levels of work-life conflict have more
personal distractions interfering with their work roles, and therefore will not be able to devote as
much cognitive, emotional, or physical energy to preparation for or actual engagement in their
work tasks, which will contribute to decreased performance. Based on this logic, we propose the
following:
Hypothesis 1a: Work-life conflict is negatively related to job performance.
Collective Efficacy. One subjective organizational outcome related to work-life conflict is
employee perceptions of his or her group’s future performance. These perceptions are akin to
collective-efficacy, defined by Bandura (1982) as personal judgments about the group’s
capacities to execute necessary behaviors to perform specified tasks. Highly efficacious
individuals tend to be more productive in setting their goals and persist more in reaching those
goals than non-efficacious individuals (Gist, 1987). Bandura (1984) suggested that individuals
with greater levels of efficacy are also more effective under stressful conditions. Many
researchers, in fact, have demonstrated that higher levels of efficacy lead to higher levels of
performance (e.g., Sadri & Robertson, 1993; Stajkovic & Luthans, 1998). Therefore, collective
efficacy can be considered a positive organizational outcome.
Although we could find no studies directly examining work-life conflict and efficacy
beliefs, we anticipate that a negative relationship exists between the two. Our rationale involves
the fact that work-life conflict is a strain that potentially saps energy, which is needed to perform
well. Therefore, employees experiencing work-life conflict may have low energy levels, and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
181

consequently feel they are unable to perform well on tasks. Based on this logic, we provide the
following hypothesis:
Hypothesis 1b: Work-life conflict is negatively related to collective efficacy perceptions.
Turnover Intention. Employee turnover, both pervasive and costly, is often one of the biggest
dilemmas for organizations and the final organizational outcome of interest in this study.
Excessive turnover is linked to numerous negative consequences including the loss of good
employees, loss of loyalty among current employees, lower perceived job quality, lower
satisfaction of customers, and loss of money for the organization (Abassi & Hollman, 2000;
Gupta & Jenkins, 1992; Hacker, 1997; White, 1995). Although actual turnover has rarely been
studied with work-life conflict, several studies have examined the relationship between work-life
conflict and turnover intentions. The findings consistently show a positive relationship between
work-life conflict and intentions to leave (Allen et al., 2000; Kossek & Ozeki, 1999; Netemeyer
et al., 1996). Based on these findings, we propose the following hypothesis:
Hypothesis 1c: Work-life conflict is positively related to turnover intentions.

PERCEPTIONS OF FAMILY-FRIENDLY WORK ENVIRONMENT


Jex (1998) suggested work stressors have both a direct and an indirect effect on
performance. That is, it may be that the inconsistent findings for the relationship between work-
life conflict and performance are due to the indirect nature of the relationship. Although
numerous moderators of the relationship may exist, we examine employee perceptions of the
family-friendliness of the work environment.
Organizations do not have to feel powerless in dealing with factors that occur outside of
their domain. Recently, many organizations have been receptive to “family-friendly” programs
to decrease employee work-life stress. Work environments are considered family-friendly when
they “(a) help workers manage the time pressures of being working parents by having policies
such as vacation time, sick leave, unpaid or personal leave, or flexible work schedules, or (b)
help workers meet their continuing family responsibilities through such programs as maternity
and paternity leave, leave that can be used to care for sick children or elders, affordable health
insurance, and child-care or elder care programs” (Marshal & Barnett, 1996, p. 253).
Despite the purported benefits of family-friendly policies, such as increased productivity,
job satisfaction, and organizational commitment, relatively few researchers have empirically
examined such family-friendly work environments (Aldous, 1990; Bourg & Segal, 1999; Glass
& Finley, 2002). The concept is fairly new and the little research that has been conducted has
been methodologically weak, based primarily on anecdotes. Although the existence of these
programs seems important, researchers have suggested that an ideal family-friendly workplace
goes beyond the availability of programs (Fredriksen-Goldsen & Scharlach, 2001; Secret &
Sprang, 2001). A true family-friendly environment exists where both day-to-day taskings and
important policy decisions include addressing the needs of the family.
Few studies go beyond measuring the mere presence or number of family-friendly
programs by actually examining the family-friendly culture of an organization. Three measures
have recently been developed to capture the essence of a family-friendly work culture. The first
scale, developed by Thomas and Ganster (1995), measures family-friendly culture and taps into
the idea that for organizational policies to be successful the organization needs supervisors that
support the policies. More recently, Eaton (2003) constructed two scales that measured
perceptions of availability of informal work-family policies and the perceptions of usability of
work-family policies, and found that perceived usability of work-family programs related more

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
182

to productivity than did actual presence of formal or informal policies. Finally, Allen (2001)
developed a scale measuring family-supportive organization perceptions. The construct tapped
by her measure is very similar to perceived organizational support (see Rhoades & Eisenberger,
2002, for a recent review of this literature), because both are measures of global support, rather
than specific measures of support, such as supervisor or coworker support. Allen differentiated
her construct with that of perceived organizational support, however, maintaining that hers
specifically concerns reactions to the organization regarding family-supportiveness whereas
perceived organizational support concerns responses to the supportiveness of the organization as
a whole, not specific to family-related aspects.

FAMILY-FRIENDLY WORK ENVIRONMENT AND WORK-RELATED OUTCOMES


Given the similarity between perceived organizational support and family-friendly work
environments, the conceptual framework should be similar in nature. Similar to Eisenberger
Huntington, Hutchison, and Sowa (1986), we stress the importance of the norm of reciprocity
between the employee and the organization. Specifically, we propose that the nature of the
relationship between the employee and the organization can be explained by the tenets of social
exchange theory (Blau, 1964) and equity theory (Adams, 1963, 1965).
Adams’ (1963, 1965) equity theory, based on cognitive dissonance theory (Festinger,
1957), posits that individuals compare ratios of inputs to outcomes of themselves and others to
establish if something is fair. For example, if an employee compares the ratio of his or her inputs
(the amount of work he or she is doing) to his or her outcomes (e.g., flexible work schedule) with
the organization’s ratio of inputs to outcomes, and perceives a discrepancy, then feelings of
dissonance will emerge that must be resolved.
Adams (1963) noted several ways in which individuals could reduce feelings of inequity.
These methods of regaining balance include increasing or decreasing inputs or outcomes to
match those of the organization, quitting, distorting perceptions, or changing the referent. In
terms of family-friendly work environments, if an employee perceives he or she is giving more
to the organization (e.g., working long hours, helping coworkers) than he or she is getting in
return (e.g., no flexibility in schedule or little vacation time), he or she will attempt to reduce the
inequity. That is, the employee may decrease his or her effort in order to balance the perceived
relationship. This same logic applies to social exchange theory (Blau, 1964), whereby the
employee will reciprocate that which he or she feels the organization is providing.
Job Performance. Individuals who perceive an organization as supportive of their needs
may feel indebted to the organization and therefore reciprocate the exchange by increasing their
performance, their efficacy beliefs, and their intentions to remain with an organization. Previous
findings concerning family-friendly work environments and job-related outcomes support these
assertions. For example, Kossek and Ozeki (1999) identified five studies (Dunham, Pierce, &
Castenada, 1987; Kossek & Nichol, 1982; Orthner & Pittman, 1986; Pierce & Newstrom, 1982;
1983) that examined the effects of a family-friendly work environment on job performance.
Results were generally positive but small.
The small effects in the five studies reported by Kossek and Ozeki (1998) could be
because actual family-friendly policies were used when perceptions may be the more appropriate
measure (Allen, 2001; James & McIntyre, 1996). Studies have consistently shown, for example,
that work environment perceptions (specifically perceived organizational support) are related to
higher job performance (e.g., Bhanthumnavin, 2003; Rhoades & Eisenberger, 2002). Similarly,
Eaton (2003) contrasted perceptions of work-family culture with actual policies and found that

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
183

perceptions of the usability of work-family policies were positively related to organizational


productivity more so than were the actual policies. We therefore propose the following
hypothesis:
Hypothesis 2a: Family-friendly work environment perceptions are positively related to
job performance.
Collective Efficacy. As discussed earlier, one subjective measure of organizational outcomes is
the individual’s perceptions of future performance, which relates closely to self- and collective-
efficacy. Just as we argued that a negative relationship between work-life conflict and efficacy
beliefs could be due to unsuccessful past performances, we argue successful past performance
facilitated by family-friendly policies could result in a positive relationship between perceptions
of family-friendly work environments and efficacy beliefs. That is, if an employee was able to
successfully complete his or her duties with help from family-friendly policies, such as a flexible
work schedule or onsite childcare, then he or she should feel just as capable of performing well
in the future if he or she perceives the same benefits are available and usable. Empirical evidence
has supported the logic that a positive relationship should exist between perceptions of a family-
friendly work environment and efficacy beliefs (Bhanthumnavin, 2003). Based on the preceding
logic and extant literature, the following hypothesis is presented:
Hypothesis 2b: Family-friendly work environment perceptions are positively related to
collective efficacy perceptions.
Turnover Intentions. Researchers have also become increasingly interested in the effects of
family-friendly policies on turnover and turnover intentions (e.g., Huselid, 1995). While the
majority of studies have reported a negative relationship between family-friendly policies and
intentions to leave (e.g., Grover & Crooker, 1995; Kossek & Ozeki, 1999), a few have found no
relationship (Dalton & Mesch, 1990; Dunham et al., 1987). Despite the inconsistent findings, we
expect a negative relationship will exist between perceptions of a family-friendly work
environment and turnover intentions because individuals will be less likely to want to leave an
organization they feel is treating them fairly and allowing the necessary flexibility to manage
their work and personal lives with greater ease. This logic has been supported in the
organizational justice literature, with withdrawal behaviors (including absenteeism, turnover, and
neglect) typically relating negatively to perceptions of procedural, distributive, and informational
justice (Colquitt, Conlon, Wesson, Porter, & Ng, 2001). We therefore propose the following
hypothesis:
Hypothesis 2c: Family-friendly work environment perceptions are negatively related to
turnover intentions.

MODERATING EFFECTS OF FAMILY-FRIENDLY WORK ENVIRONMENT


PERCEPTIONS
Based on the same logic of social exchange theory, the negative effects of work-life
conflict on performance may be lessened when an individual perceives the organization is
family-friendly. Equity theory and social exchange theory suggest that a healthy relationship
would be based on an equitable exchange between the two entities. The employee would expect
more from the organization to gain a sense of equality if they felt they were giving more than the
organization was returning. One considerable organizational contribution is that of a family-
friendly work environment. That is, a family-friendly work environment can act as a buffering
agent between employee conflict and organizational outcomes.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
184

Bourg and Segal (1999) examined the impact of family-friendly policies and practices in
the military on the perceived conflict between enlisted soldiers’ family and unit. They concluded
“responsiveness to families on the part of the military will lessen the degree of conflict between
the two greedy institutions of the military and the family” (p. 647). They further noted that such
policies and practices can serve as a way for the organization (i.e., the military in their study) “to
send a message to soldiers and family members that the family is no longer viewed as a
competing outside influence” (p. 648).
The mere presence of family-friendly policies, however, is not enough (Behson, 2002;
Raabe & Gessner, 1988). That is, the employee’s perceptions of the workplace are more
important than the workplace itself in affecting attitudinal and behavioral organizational
responses (Allen, 2001; James & McIntyre, 1996). For example, two individuals working for the
same organization could perceive the family-friendliness of the work environment as being
completely different. The first employee may perceive there are available resources for him or
her to use to reduce the conflict felt between the demands of work and home. By having these
perceptions, he or she may be more likely to pursue these resources and receive their associated
benefits than if the perceptions were absent. Therefore, the employee would be less likely to
experience the detrimental effects associated with the conflict. The second employee, however,
may not feel there are adequate resources available, regardless of having the same actual
resource availability. He or she may not pursue, and therefore not get, resources that could be
beneficial in reducing work-life conflict. He or she therefore would more likely experience the
adverse effects of the work-life conflict.
This logic leads to the following hypotheses:
Hypothesis 3a: Family-friendly work environment perceptions moderate the work-life
conflict-job performance relationship.
Hypothesis 3b: Family-friendly work environment perceptions moderate the work-life
conflict-collective efficacy perceptions relationship.
Hypothesis 3c: Family-friendly work environment perceptions moderate the work-life
conflict-turnover intentions relationship.
One of the concerns of family-friendly work environments is that they discriminate
against the single, childless employee (Rothausen, Gonzalez, Clarke, & O’Dell, 1998). When
employees leave work early to attend to a sick child or spouse, the remaining employees must
compensate for their absence. This could lead the remaining employees who do not have such
demands to feel resentment toward the absent employees, and possibly toward the organization.
Furthermore, by allowing employees with families to leave as needed (i.e., providing a family-
friendly work environment), the remaining employees may be adversely affected in terms of
their performance. Few empirical studies, however, have tested the notion that family-friendly
work environments may have negative effects on employees with fewer family responsibilities or
demands.
Because a family-friendly work environment is intended to assist employees with
families, employees who are more likely to need a family-friendly work environment (i.e.,
employees with more family responsibilities) would benefit from that environment more so than
those who are less likely to need that environment. Behson (2002) noted, “organizational
policies, programs, and attitudes that specifically address the topic of work-family balance may
be of limited salience to ‘non-familied’ employees” (p. 67). So, whereas the policies may not be
detrimental to employees without families, they certainly may not be beneficial either. With this
in mind we propose that number of family responsibilities moderates the relationship between

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
185

family-friendly work environment perceptions and performance. Specifically, individuals with


more family responsibilities will benefit more from positive family-friendly work environment
perceptions than those with fewer family responsibilities. We therefore propose the following
hypotheses:
Hypothesis 4a: Number of family responsibilities moderates the relationship between
family-friendly work environment perceptions and job performance.
Hypothesis 4b: Number of family responsibilities moderates the relationship between
family-friendly work environment perceptions and collective efficacy perceptions.
Hypothesis 4c: Number of family responsibilities moderates negative relationship
between family-friendly work environment perceptions and turnover intentions.

CURRENT STUDY
Work stressors, work-life conflict, and family-friendly work environments are issues that
are salient to military and civilian workers alike. First, the military population is similar to many
civilian organizations in that it has excessive work stressors such as long hours, lack of
predictable schedule, and high levels of perceived overload (Castro & Adler, 1999). Second, like
many civilian organizations (Allen, 2001), family-friendly policies are a standard policy in the
military (Department of Defense, 1996). Finally, the military has been described as a reflection
of the civilian society (Martin, Rosen, & Sparacino, 2000), and thus shares similar
interrelationships between work and family.
The current study investigated how family-friendly work environments act as a buffering
mechanism in the stressor-strain relationship. Specifically, we examined employees’ perceptions
of family-friendly work environments and how these perceptions directly and indirectly affected
both subjective and objective measures of performance.

METHOD
Participants
The participants in the study were soldiers (N=230) stationed in Europe. All participants
were active duty US Army personnel with an average of 8 years in the military. There were
61.4% non-commissioned officers, 31.3% junior-enlisted soldiers, and 7.4% commissioned
officers. The sample was predominantly male (84.8%) and the largest ethnic group was White
(51.8%), followed by African-American (27.9%), Hispanic (10.6%) and other (9.8%). In terms
of marital status, 64.6% of the participants were married, 24.5% had never been married (single),
and 11.0% were separated or divorced. Approximately half of the sample (51.1%) had children
living at home.
Procedure
This paper is part of a larger study examining the effects of workload on individual and
organizational outcomes. Military personnel in 10 units stationed in Germany and Italy were
surveyed every three months for two years. Questionnaires were administered on-site at the
military unit by research principal investigators or trained research assistants with follow-up data
collections to obtain data from absent personnel. We only included data obtained from January
2001 to May 2001 because questionnaires during this time period included scales that assessed
perceptions of family-friendly work environment. In addition to the survey items, research staff
also collected actual performance measures by visiting the units approximately three months
after the data collection and collecting physical training scores and marksmanship scores that
coincided with the survey data. Data from participants were only included if complete surveys

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
186

were available from both the survey and unit record data. See Castro, Adler, and Bienvenu
(1998) for a full description of the methodology.
Measures
Family Responsibilities. Family responsibilities were determined by combining marital status
(married = 1 and single = 0) and number of children living at home. For example, if the
individual was married and had two children living at home family responsibilities would be
three.
Work-life Conflict. Work-life conflict was measured using a four-item scale modified by Gutek,
Searle, and Klepa, (1991; based on Kopelman et al., 1983). This scale was designed to measure
the extent to which work roles interfere with life roles. Sample items include, “After work, I
come home too tired to do some of the things I’d like to do” and “On the job, I have so much
work to do that it takes away from my personal interests.” Response choices ranged from 1
(strongly disagree) to 5 (strongly agree). Scores were calculated by summing all items.
Family-Friendly Work Environment. The extent to which the environment is perceived as
family-friendly was assessed using items adapted from Allen’s (2001) measure of family
supportive organizational perceptions. Sample items from the eight-item scale include, “In this
unit, it is assumed that the most productive employees put their work before their family” and
“In this unit, it is assumed that work should be the primary priority in the employees’ lives”
(both items reverse-scored). Response choices ranged from 1 (strongly disagree) to 5 (strongly
agree). Scores were calculated by summing all items.
Job Performance. Objective performance ratings were obtained from unit records. Military
personnel are tested on their shooting capability twice each year. All soldiers must obtain
qualifying scores with their assigned weapon. In the current study, weapon scores (i.e.,
marksmanship scores) were based on M16 total scores from the participant’s most recent
qualification record. The M16 is the standard weapon that is issued to all enlisted military
personnel. The possible range of scores was 0 to 40, with a score of 24 being necessary to
successfully qualify.
Participants’ total physical training scores were also used. All soldiers are required to
take a physical fitness test twice each year, consisting of a two-mile run and the total number of
push-ups and sit-ups that can be performed in two minutes. The run time and number of push-
ups and sit-ups are then standardized based on sex and age, with scores ranging from 0 to 100 for
each event. Physical training scores were calculated by adding sit-up, push-up, and running
standardized scores.
Collective Efficacy. A four-item combat readiness scale that assessed participants’ perceptions
of their future level of performance was used to measure perceptions of collective efficacy
(Vaitkus, 1994). This measure has been used in previous studies to assess collective efficacy
(e.g., Jex & Bliese, 1999). Sample items include “I think my unit would do a better job in combat
than most US Army units” and “I have real confidence in my unit’s ability to perform its
mission.” Response choices ranged from 1 (strongly disagree) to 5 (strongly agree). Scores
were calculated by summing all items.
Turnover Intentions. Turnover intentions were measured with a single item: “Which best
describes your current active-duty Army career intentions?” The response options were: 1)
definitely stay in until retirement; 2) probably stay in until retirement; 3) definitely stay in
beyond present obligation, but not until retirement; 4) undecided; 5) probably leave upon
completion; or 6) definitely leave upon completion of current obligation. This item has been used
in previous military research (Tremble, Payne, Finch, & Bullis, 2003) to measure career intent.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
187

Previous studies have found that one-item measures can be psychometrically comparable to
multiple-item measures (Gardner, Cummings, Dunham, & Pierce, 1998; Wanous, Reichers, &
Hudy, 1997).

RESULTS
The means, standard deviations, and reliabilities (where appropriate) for all of the key
variables are included in Table 1.

Table 1

Correlations between Work-Life Variables, Organizational Outcome Construct, and Control


variables.

Mean SD 1 2 3 4 5 6 7 8
Work and Life Variables
1. FFWE 18.44 4.74 (.82)
2. Work-Life Conflict 14.30 3.64 -.49** (.93)
Organizational Outcomes
3. Collective Efficacy 12.53 3.67 .25** -.17* (.71)
4. Physical Training Scores 249.65 31.98 .18** .02 .23** --
5. Marksmanship Scores 31.84 5.37 .13 -.13 .04 .01 --
6. Turnover Intentions 3.23 1.93 -.29** .22** -.25** -.18** -.32** --
Demographics
7. Sex 1.85 .36 -.01 .09 .05 -.07 .23** -.17** --
8. Rank 15.64 2.70 .22** .02 .15* .20** .09 -.39** -.05 --

Note. Coefficient alphas are presented in parentheses on the diagonal. For sex, females are coded as 1 and males are
coded as 2. FFWE = Family friendly work environment
N = 230. ** p < .01; * p < .05.

Sex and rank were used as control variables in all analyses. Moderated regressions yield a
high likelihood of a Type II error rate (Aiken & West, 1991) therefore we selected an alpha level
of .10 when testing interactions. A .05 alpha level was used for all other analyses.
Hypothesis 1a proposed that work-life conflict was negatively related to job performance.
We found no support for this hypothesis. Hypothesis 1b proposed that work-life conflict was
negatively related to perceptions of future performance. This hypothesis was supported, with
work-life conflict being related to perceptions of future combat performance (β = -.177, p =
.007). The control variables (sex and rank) and perceptions of future combat performance
explained 6% of the variance of work-life conflict. Hypothesis 1c proposed that work-life
conflict was positively related to turnover intentions. This hypothesis was supported, with work-
life conflict being negatively related to career intentions (β = .25, p = .000). The control variables
and career intentions explained 25 percent of the variance.
Hypotheses 2a, 2b, and 2c predicted positive relationships between family-friendly work
environment perceptions and job performance, perceptions of future performance, and turnover
intentions, respectively. Hypothesis 2a was partially supported, with family-friendly work
environment perceptions being related to physical training scores (β = .141, p = .036) but not to
marksmanship scores. Hypotheses 2b and 2c were supported, with significant relationships
existing between family-friendly work environment perceptions and perceptions of future

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
188

combat performance (β = .223, p = .001), and turnover intentions (β = -.235, p = .000). After
controlling for sex and rank, family-friendly work environment perceptions accounted for 6%
(physical training), 8% (combat readiness), and 30% (turnover intentions) of the variance in the
performance measures.
Hypothesis 3a, 3b, and 3c proposed that family-friendly work environment perceptions
would moderate the relationships between work-life conflict and job performance, perceptions of
future performance, and turnover intentions, respectively. Only hypothesis 3a was supported,
with the interaction shown in Figure 1. That is, family-friendly work environment perceptions
moderated the relationship between work-life conflict and physical training scores (Table 2).
425
High
400

Low FFWE
375 High FFWE
PT
Score 350

325

Low300
Low Work Life Conflict High

Figure 1
Interaction between Work-Life Conflict, FFWE, and Physical Training Scores

Table 2
Interaction between Work-Life Conflict and FFWE Perceptions on Physical Training Scores
Variable B SE B β R2 ∆R2

Step 1: .04 .04*


Sex -5.05 5.78 -.06
Rank 2.30** .77 -.07
Step 2: .08 .04*
Sex -6.23 5.74 -.07
Rank 1.68* .79
FFWE 1.46** .52 .22
Work-Life Conflict 1.16 .66
Step 3: .11 .03**
Sex -5.75 5.65 -. 07
Rank 1.81* .78 .15
FFWE 1.35** .51 .20
Work-Life Conflict .89 .66 .10
FFWE x Work-Life Conflict .27** .09 .20
Notes. N = 289. FFWE = Family Friendly Work Environment. For sex, females are coded as 1 and males
are coded as 2. The B weights in the columns are from the step of entry into the model. ** p < .01. * p < .05.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
189

For both employees with high family-friendly work environment perceptions and
employees with low perceptions, higher levels of work-life conflict was related to higher
physical training scores. The physical training scores, however, were significantly higher for
those employees with higher levels of family-friendly work environment perceptions.
Hypotheses 4a, 4b, and 4c proposed that the number of family responsibilities would
moderate the relationship between family-friendly work environment perceptions and job
performance, perceptions of future combat performance, and turnover intentions, respectively.
As shown in Tables 3 and 4 family responsibilities moderated the relationship between family-
friendly work environment perceptions and marksmanship scores (Figure 2), providing partial
support for Hypothesis 4a, and the relationship between family-friendly work environment
perceptions and turnover intentions (Figure 3), providing full support for Hypothesis 4c.

Table 3
Interaction between Family Responsibilities and FFWE Perceptions on Marksmanship Scores

Variable B SE B β R2 ∆R2

Step 1: .06 .06*


Sex 3.05** 1.12 .23
Rank .24 .21 .10
Step 2: .08 .02
Sex 2.80** 1.67 .21
Rank .18 .21 .07
FFWE .18 .11 .14
Family Responsibilities .26 .33 .07
Step 3: .11 .03*
Sex 3.20** 1.17 .24
Rank .11 .21 .05
FFWE .15 .11 .11
Family Responsibilities .25 .32 .07
FFWE x Family Responsibilities .18* .08 .18
Notes. N = 289. FFWE = Family Friendly Work Environment. For sex, females are coded as 1 and males are coded
as 2. The B weights in the columns are from the step of entry into the model. ** p < .01. * p < .05.

Table 4
Interaction between Family Responsibilities and FFWE Perceptions on Turnover Intentions

Variable B SE B β R2 ∆R2

Step 1: .18 .18**


Sex -1.02** .32 -.19
Rank -.28 .04 -.39
Step 2: .30 .12**
Sex -.68** .31 -.13
Rank -.19* .04 -.26

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
190

FFWE -.09** .02 -.24


Family Responsibilities -.39** .08 -.30
Step 3: .31 .01~
Sex -.74* .31 -.14
Rank -.18** .04 -.25
FFWE -.10** .02 -.24
Family Responsibilities -.41** .08 -.31
FFWE x Family Responsibilities -.02~ .02 -.10
Notes. N = 289. FFWE = Family Friendly Work Environment. For sex, females are coded as 1 and males are coded
as 2. The B weights in the columns are from the step of entry into the model. ** p < .01. * p < .05. ~p <.10.

Leave
10

Low Family
0 Responsibilities
Turnover
Intentions High Family
-5
Responsibilities
-10

-15

Stay
-20

Low High
Family Friendly Work
Environment Perceptions
Figure 2
Interaction between FFWE, Family Responsibilities and Marksmanship Scores

40
High
35

30 Low Family
25
Responsibilities
Marksmanship High Family
Score 20
Responsibilities
15

10

Low0
Family Friendly Work High
Low
Environment Perceptions

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
191

Figure 3
Interaction between FFWE, Family Responsibilities and Turnover Intentions

Specifically, employees with more family responsibilities were more likely to have higher
marksmanship scores if they perceived their environment to be family-friendly compared to
employees with fewer family responsibilities. Similarly, employees with more family
responsibilities were more likely to indicate they were going to remain in the military than
employees with fewer family responsibilities if they perceived their work environment to be
friendly toward families. No support was found for hypothesis 4b.

DISCUSSION
One solution for dealing with high work-life conflict is establishing a family-friendly
work environment. The current study explored how social exchange theory and equity theory can
help explain the relationships between work-life conflict and work outcomes in relation to
family-friendly work environments. While previous studies have examined work outcomes and
family-friendly work environments, few studies have assessed both subjective and objective
performance measures.
We found mixed results in examining the relationship between work-life conflict and
performance and organizational outcomes. Whereas perceptions of work-life conflict were
related to perceptions of future combat performance and turnover intentions, they were not
related to individual performance measures (i.e., marksmanship or physical training scores). One
possible explanation for this pattern of findings is that soldiers, as well as unit leaders (see
below), will always ensure that individual performance remains relatively high, particularly for
skills such as physical training and marksmanship, even when other work-life demands are high.
Soldiers are particularly motivated to maintain high physical fitness and marksmanship scores, as
both are important factors in determining directly and indirectly job advancement. Indeed, as
Greenhaus et al. (1987) have pointed out, there are many job skills that are required to maintain a
high level of performance Physical fitness and marksmanship skills certainly fall into this
category, and the maintenance of such skills appears to trump other responsibilities and
demands, both in and outside the work domain.
Within the work domain, while employees may continue to complete their expected job
duties, and maintain their skills directly related to their own promotion and performance
evaluations, they might be less likely to perform extra, non-mandated or non-required duties,
especially when other demands become high. Whereas the individual’s required performance
would remain unchanged, not performing these additional tasks could adversely impact on the
organization. These non-work related tasks have been referred to as organizational citizenship
behaviors, and have been show to be linked to organizational effectiveness (Chen, Hui, & Sego,
1998; Podsakoff & MacKensie, 1997).
Perhaps the most interesting finding in our study in terms of unit readiness is that while
work-life conflict did not impact individual job performance measures such as physical training
and marksmanship, it was related to soldiers’ perceptions of team performance in future combat.
Military units must function effectively as teams during war in order to be successful. The
present data indicated, however, that although soldiers’ job performance skills were unrelated to
work-life conflict, they were more pessimistic about future combat performance of their group as
their work-life conflict increased. Given that efficacy beliefs have been consistently linked to
performance (e.g., Sadri & Robertson, 1993; Stajkovic & Luthans, 1998), these findings suggest

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
192

that work-life conflict can be detrimental to the group, even if it is not directly through decreased
performance of individual team members.
Our hypotheses that perceptions of a family-friendly work environment would moderate
the relationship between work-life conflict and organizational outcomes were based on social
exchange and equity theories. Disappointedly, only one of our hypotheses was partially
supported. Specifically, perceptions of a family-friendly work environment moderated the
relationship between work-life conflict and physical training scores. Surprisingly, however, the
nature of the interaction between work-life conflict and family-friendly work environment for
physical training scores was in the opposite direction as predicted. That is, higher work-life
conflict was related to higher physical training scores in general and especially when perceptions
of family-friendly work environment were high. One possible explanation for why we observed
this anti-buffering effect has to do with the nature of the performance measure, physical training.
Namely, physical fitness training is mandatory training that usually occurs early in the morning,
three times a week. Thus, while having a routine physical fitness training program no doubt will
lead to higher physical fitness scores, the time the training is conducted is also likely to interfere
with taking care of family responsibilities that occur in the morning, such as helping to get
children fed, dressed, and transported to daycare or school.
We hypothesized a direct relationship between family-friendly environment perceptions
and performance and organizational outcomes. These hypotheses were generally supported.
Family-friendly work environment perceptions were related to individual job performance (i.e.,
physical training scores), perceptions of future combat performance, and turnover intentions. We
also found that perceptions of a family-friendly work environment moderated the relationship
between work-life conflict and physical training. In other words, regardless of the level of work-
life conflict, employees who perceived their organization to have a family-friendly work
environment also reported higher physical training scores. These results suggest that perceptions
of a family-friendly work environment are important regardless of the level of work-life conflict.
Researchers have suggested that family-friendly work policies might benefit only those
employees with family demands or responsibilities, while penalizing those without such
demands (Jenner, 1994). We examined this by testing if the number of family responsibilities an
employee has moderates the relationship between perceptions of family-friendly work
environment and organizational outcomes. This hypothesis was only partially supported;
employees with more family responsibilities had higher marksmanship scores if they perceived
their environment to be family-friendly compared to employees with fewer family
responsibilities. More importantly, perceived family-friendly environments did not hinder the
performance of individuals with fewer family responsibilities. Contrary to Jenner’s (1994)
speculation, these results suggest that individuals who do not directly benefit from perceived
family-friendly environments are not hindered by them either. It is possible that even employees
without family responsibilities or demands nevertheless will still support family-friendly work
environments with the expectation that when they have families that they too will benefit from
them.
Limitations
There were several limitations to the current study. These results are based on a military
sample, and therefore they may not generalize to a civilian population. Furthermore, the military
units studied were stationed overseas where life stressors may have been higher than that of a
stateside sample, such as higher likelihood of deployments, separation from personal support
networks (e.g., parents), and everyday cultural differences. Finally, turnover is viewed

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
193

differently in the military compared to civilian organizations, whereby most military members
are obligated to fulfill a written contract specifying their length of military service, civilians have
relatively more freedom in job movement. These differences may have affected the ratings of
turnover intentions. Nevertheless, the effect of these limitations would have likely resulted in
range restriction, making it harder to find relationships should they exist. Therefore, the use of a
military sample should not be considered a limitation in this study.
Another limitation of this study is the nature of the data collection procedure.
Specifically, cross sectional data does not allow us to know causal relationships. Without
longitudinal data we are unable to know the direction of the relationships we examined. For
example, it could be that individuals who score higher on their physical training tests are
subsequently treated better by the organization, and therefore view the organization as being
family-friendly. Future studies should examine the effects of work-life conflict, family-friendly
work environment perceptions, and organizational outcomes using a longitudinal design.
Implications and Future Studies
Our results support the underlying tenets of social exchange theory and equity theory.
That is, the relationship between the employee and the organization is a balancing act. As the
responsibilities in the organization come to dominate the relationship, the employee will attempt
to tip the scale back in his or her favor. This response may represent an attitude shift (i.e.,
perceptions of future performance and career intentions) or an actual behavioral change
(performance). Organizations have the ability to overcome some of the negative affects that can
occur in an unbalanced relationship. The current study showed how maintaining a family-
friendly work environment is one way to modify imbalances between the employee and
organization.
The U.S. military has numerous programs to assist families, yet having programs is not
always enough (Secret & Sprang, 2001). As we have shown in the present study, perceptions of a
family-friendly work environment are also an important factor to key organizational outcomes.
While the senior leadership is usually responsible for establishing policy that encourages a
family-friendly work environment, it is up to the local leadership to foster and support the policy
in order to create a family-friendly culture.
There appear to be direct beneficial outcomes associated with family-friendly work
environments. There may also be more indirect benefits. Individual’s career choice may be
partially due to the family-friendly culture of the work environment. Future studies should look
at other variables of interest such as recruitment and applicant attraction (Rau & Hyland, 2002).
The current trend is for organizations to adopt family-friendly policies, with the intent to
improve and retain effective employees. We suggest that organizations may want to expand the
notion of a “family-friendly work environment” to “employee- or life-friendly work
environment”. Single and married employees have pressures and responsibilities that extend
beyond the family that can interfere with their ability to successfully perform the jobs.
Lockwood (2003) suggested that the trend in work-family research is to broaden the term from
work-family to work-life. We suggest the same should be done for the policies that benefit these
interrelated domains.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
194

REFERENCES
Abbasi, S. M., & Hollman, K. W. (2000). Turnover: The real bottom line. Public Personnel
Management, 29, 333-343.
Adams, J. S. (1963). Toward an understanding of inequity. Journal of Abnormal and Social
Psychology, 67, 422-436.
Adams, J. S. (1965). Inequity in social exchange. In L. Berkowitz (Ed.), Advances in
experimental social psychology (Vol. 2, pp. 267-299). New York: Academic Press.
Adams, G. A., King, L. A., & King, D. W. (1996). Relationships of job and family involvement,
family social support, and work-family conflict with job and life satisfaction. Journal of
Applied Psychology, 81, 411-420.
Adler, A. B., Thomas, J., & Castro, C. A. (2002, August). Measuring Up: A Comparison of Self-
Reports and Unit Records for Assessing Soldier Performance. Paper presented at the annual
meeting of the American Psychological Association, Chicago, IL.
Aiken, L. S., & West, S. G., (1991). Multiple regression: Testing and interpreting interactions.
Newbury Park: Sage.
Aldous, J. (1990). Specification and speculation concerning the politics of workplace family
policy. Journal of Family Issues, 11, 921-936.
Allen, T. D. (2001). Family-supportive work environments: The role of organizational
perceptions. Journal of Vocational Behavior, 58, 414-435.
Allen, T. D., Herst, D. E. L., Bruck, C. S., & Sutton, M. (2000). Consequences associated with
work-to-family conflict: A review and agenda for future research. Journal of Occupational
Health Psychology, 5, 278-308.
Aryee, S. (1992). Antecedents and outcomes of work-family conflict among married professional
women: Evidence from Singapore. Human Relations, 45, 813-837.
Bandura, A. (1982). Self-efficacy mechanism in human agency. American Psychologist, 37, 122-
147.
Bandura, A. (1984). Recycling misconceptions of perceived self-efficacy. Cognitive Therapy and
Research, 8, 231-255.
Behson, S. J. (2002). Which dominates? The relative importance of work-family organizational
support and general organizational context on employee outcomes. Journal of Vocational
Behavior, 61, 53-72.
Bhanthumnavin, D. (2003). Perceived social support from supervisor and group members’
psychological and situational characteristics as predictors of subordinate performance in Thai
work units. Human Resource Development Quarterly, 14(1), 79-97.
Blau, P. M. (1964). Exchange and Power in Social Life. New York: Wiley.
Bommer, W. H., Johnson, J. L., Rich, G. A., Podsakoff, P. M., & Mackenzie, S. B. (1995). On
the interchangeability of objective and subjective measures of employee performance: A
meta-analysis. Personnel Psychology, 48, 587-605.
Bourg, C., & Segal, M. W. (1999). The impact of family supportive policies and practices on
organizational commitment to the Army. Armed Forces & Society, 25, 633-652.
Burke, R. J. (1988). Some antecedents and consequences of work-family conflict. Journal of
Social Behavior and Personality, 3, 287-302.
Cascio, W. F. (1998). Applied psychology in human resource management (5th ed.). Upper
Saddle River, NJ: Prentice Hall.
Castro, C. A., & Adler, A. B. (1999, Autumn). The impact of operations tempo on soldier and
unit readiness. Parameters, 86-95.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
195

Castro, C. A., Adler, A. B., & Bienvenu, R. V. (1998). A human dimensions assessment of the
impact of OPTEMPO on the forward-deployed soldier [WRAIR Protocol #700]. Silver
Spring, MD: Walter Reed Army Institute of Research.
Chen, X. Hui, C. & Sego, D. J. (1998). The role of organizational citizenship behavior in
turnover: Conceptualization and preliminary tests of key hypotheses. Journal of Applied
Psychology, 83, 922-931
Colquitt, J. A., Conlon, D. E., Wesson, M. J., Porter, C. O. L. H., & Ng, K. Y. (2001). Justice at
the millennium: A meta-analytic review of 25 years of organizational justice research.
Journal of Applied Psychology, 86, 425-445.
Dalton, D. R., & Mesch, D. J. (1990). The impact of flexible scheduling on employee attendance
and turnover. Administrative Science Quarterly, 35, 370-387.
Dunham, R. B., Pierce, J. L., & Castenada, M. B. (1987). Alternative work schedules: Two field
quasi-experiments. Personnel Psychology, 40, 215-242.
Department of Defense. (1996, December). Department of Defense and families: A total force
partnership. Department of Defense.
Eaton, S. C. (2003). If you can use them: Flexibility policies, organizational commitment, and
perceived performance. Industrial Relations, 42, 145-167.
Eisenberger, R., Huntington, R., Hutchison, S., & Sowa, D. (1986). Perceived organizational
support. Journal of Applied Psychology, 71, 500-507.
Festinger, L. (1957). A theory of cognitive dissonance. Stanford, CA: Stanford University Press.

Fredriksen-Goldsen, K. I., & Scharlach, A. E. (2001). Families and work. New directions in the
twenty-first century. New York: Oxford University Press.
Frone, M. R. (2000). Work-family conflict and employee psychiatric disorders. The national
comorbidity survey. Journal of Applied Psychology, 85, 88-895.
Frone, M. R. (2003). Work-family balance. In J. C. Quick & L. E. Tetrick (Eds.), Handbook of
occupational health psychology. Washington, DC: American Psychological Association.
Frone, M. R., Russell, M., & Barnes, G. M. (1996). Work-family conflict, gender, and health-
related outcomes. A study of employed parents in two community samples. Journal of
Occupational Health Psychology, 1, 57-69.
Frone, M. R., Russell, M., & Cooper, M. L. (1992). Antecedents and outcomes of work-family
conflict testing a model of the work-family interface. Journal of Applied Psychology, 77, 65-
78.
Frone, M. R., Yardley, J. K., & Markel, K. S. (1997). Developing and testing an integrative
model of the work-family interface. Journal of Vocational Behavior, 50, 145-167.
Gardner, D. G., Cummings, L. L., Dunham, R. B., & Pierce, J. L. (1998). Single-item versus
multiple-item measurement scales: An empirical examination. Educational and
Psychological Measurement, 58, 898-915.
Gist, M. E. (1987). Self-efficacy: Implications for organizational behavior and human resource
management. Academy of Management Review, 12, 472-485.
Glass, J. S., & Finely, A. (2002). Coverage and effectiveness of family-responsive workplace
policies. Human Resource Management Review, 12, 313-337.
Greenhaus, J. H., Bedeian, A. G., & Mossholder, K. W. (1987). Work experiences, job
performance, and feelings of personal and family well-being. Journal of Vocational
Behavior, 31, 200-215.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
196

Greenhaus, J. H., Collins, K. M., Singh, R., & Parasuraman, S. (1997). Work and family
influences on departure from public accounting. Journal of Vocational Behavior, 50, 249-
270.
Greenhaus, J. H., Parasuraman, S., & Collins, K. M. (2001). Career involvement and family
involvement as moderators of relationships between work-family conflict and withdrawal
from a profession. Journal of Occupational Health Psychology, 6, 91-100.
Grover, S. L., & Crooker, K. J. (1995). Who appreciates family-responsive human resource
policies: The impact of family-friendly policies on the organizational attachment of parents
and non-parents. Personnel Psychology, 48, 271-288.
Gupta, N., & Jenkins, G. D., Jr. (1992). The effects of turnover on perceived job quality. Group
and Organization Management, 17, 431-446.
Gutek, B. A., Searle, S., & Klepa, L. (1991). Rational versus gender role expectations for work-
family conflict. Journal of Applied Psychology, 76, 560-568.
Hacker, C. (1997, October). The cost of poor hiring decisions…And how to avoid them. HR
Focus, 5-13.
Harris, M. M., & Schaubroeck, J. (1988). A meta-analysis of self-supervisor, self-peer, and peer-
supervisor ratings. Personnel Psychology, 41, 43-62.
Hart, P.M. (1999). Predicting employee life satisfaction: A coherent model of personality, work
and nonwork experiences, and domain satisfactions. Journal of Applied Psychology, 84, 564-
584.
Hartline, M. D., & Ferrel, O. C. (1996). The management of customer-contact service
employees: An empirical investigation. Journal of Marketing, 60, 52-70.
Heneman, R. L. (1986). The relationship between supervisory ratings and results-oriented
measures of performance: A meta-analysis. Personnel Psychology, 39, 811-826.
Huselid, M. A. (1995). The impact of human resource management practices on turnover,
productivity, and corporate financial performance. Academy of Management Journal, 38,
635-672.
James, L. R., & McInytre, M. D. (1996). Perceptions of organizational climate. In K. R. Murphy
(Ed.), Individual differences and behavior in organizations (pp. 416-450). San Francisco:
Jossey-Bass.
Jenner, L. (1994). Family-friendly backlash. Management Review, 7.
Jex, S. M. (1998). Stress and job performance: Theory, research, and implications for
managerial practice. Thousand Oaks, CA: Sage.
Jex, S. M., & Bliese, P. D. (1999). Efficacy beliefs as a moderator of the impact of work-related
stressors. A multilevel study. Journal of Applied Psychology, 84, 340-361.
Kahn, R. L., Wolfe, D. M., Quinn, R., Snoek, J. D., & Rosenthal, R. A. (1964). Organizational
stress. New York: Wiley.
Katosh, J. P., & Traugott, M. W. (1981). The consequences of validated and self-reported voting
measures. Public Opinion Quarterly, 45, 519-535.
Kleck, G. (1982). On the use of self-report data to determine the class distribution of criminal
and delinquent behavior. American Sociological Review, 47, 427-433.
Kopelman, R. E., Greenhaus, J. H. & Connolly, T. F. (1983). A model of work, family, and
interrole conflict: A construct validation study. Organization Behavior and Human
Performance, 32, 198-215.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
197

Kossek, E. E., & Lobel, S. (1996). Beyond the family friendly organization. In E. E. Kossek & S.
A. Lobel (Eds.), Managing diversity: Human resource strategies for transforming the
workplace (pp. 221-244). Oxford: Blackwell.
Kossek, E. E., & Nichol, V. (1992). The effects of on-site child care on employee attitudes and
performance. Personnel Psychology, 45, 485-509.
Kossek, E. E., & Ozeki, C. (1998). Work-family conflict, policies, and the job-life satisfaction
relationship: A review and directions for future organizational behavior-human resources
research. Journal of Applied Psychology, 83, 139-149.
Lockwood, N. R. (2003). Work/life balance. Challenges and solutions. Society for Human
Resource Management Research Quarterly, 2, 1-11.
Major, V. S, Klein, K. J., & Ehrhart, M. G. (2002). Work time, work interference with family,
and psychological distress. Journal of Applied Psychology, 87, 427-436.
Marshal, N. L., & Barnett, R. C. (1996). Family-friendly workplaces, work-family interface, and
worker health. In G. P. Keita & J. J. Hurrell (Eds.), Job stress in a changing workforce:
Investigating gender, diversity, and family issues (pp. 253-264). Washington, DC: American
Psychological Association.
Martin, J. A., Rosen, L. N., & Sparacino, L. R. (2000). The military family: A practice guide for
human service providers. Westport, CT: Praeger.
Netemeyer, R. G., Boles, J. S., & McMurrian, R. (1996). Development and validation of work-
family conflict and family-work conflict scales. Journal of Applied Psychology, 81, 400-410.
Orthner, D. K., & Pittman, J. F. (1986). Family contributions to work commitment. Journal of
Marriage and the Family, 48, 573-581.
Pierce, J. L., & Newstrom, J. W. (1982). Employee responses to flexible work schedules: An
inter-organization, inter-system comparison. Journal of Management, 8, 9-25.
Pierce, J. L., & Newstrom, J. W. (1983). The design of flexible work schedules and employee
rezones: Relationships and process. Journal of Occupational Behaviour, 4, 247-262.
Perrewe, P. L., Treadway, D. C., & Hall, A. T. (2003). The work and family interface: Conflict,
family-friendly policies, and employee well-being. In D. A. Hofmann & L. E. Tetrick (Eds.),
Health and safety in organizations: A multilevel perspective (pp. 285-315). San Francisco:
Jossey-Bass/Pfeiffer.
Podsakoff, P. M., & MacKensie, S. B. (1997). Impact of organizational citizenship behavior on
organizational performance: A review and suggestions for future research. Human
Performance, 10, 133-151
Raabe, P. H., & Gessner, J. (1988). Employer family-supportive policies: Diverse variations on
the theme. Family Relations, 37, 196-202.
Rau, B. L., Hyland, M. M. (2002). Role conflict and flexible work arrangements: The effects on
applicant attraction. Personnel Psychology. 55, 111-136.
Rothausen, J. J., Gonzalez, J. A., Clark, N., & O’Dell, L. (1998). Family-friendly backlash-fact
or fiction? The case of organizations’ on-site child care centers. Personnel Psychology, 51,
685-703.
Rhoades, L., & Eisenberger, R., (2002). Perceived organizational support: A review of the
literature. Journal of Applied Psychology, 87, 698-714.
Sadri, G., & Robertson, I. T. (1993). Self-efficacy and work-related behaviour: A review and
meta-analysis. Applied Psychology: An International Review, 42, 139–152.
Secret. M., & Sprang, G. (2001). The effects of family-friendly workplace environments on the
work-family stress of employed parents. Journal of Social Service Research, 28, 21-41.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
198

Spector, P. E., Dwyer, D. J., & Jex, S. M. (1988). Relation of job stressors to affective, health,
and performance outcomes: A comparison of multiple data sources. Journal of Applied
Psychology. 73, 11-19
Stajkovic, A., & Luthans, F. (1998). Self-efficacy and work-related performance: A meta-
analysis. Psychological Bulletin, 124, 240–261.
Thomas, C. A., & Ganster, D. C. (1995). Impact of family-supportive work variables on work-
family conflict and strain: A control perspective. Journal of Applied Psychology, 80, 6-15.
Thornton, G. C. (1980). Psychometric properties of self-appraisals of job performance.
Personnel Psychology, 33, 263-271.
Tremble,T. R., Jr., Payne, S. C., Finch, J. F., & Bullis, R. C. (2003). Opening organizational
archives to research: Analog measures of organizational commitment. Military Psychology,
15, 167-190.
Vaitkus, M. (1994). Unit Manning System: Human dimensions field evaluation of the COHORT
company replacement model. Technical report ADA285942, Washington, DC.
Wanous, J. P., Reichers, A. E., & Hudy, M. J. (1997). Overall job satisfaction: How good are
single-item measures? Journal of Applied Psychology, 82, 247-252.
White, G. L. (1995). Employee turnover: The hidden drain on profits. HR Focus, 72(1), 15-18.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
199

Tracking U.S. Navy Reserve Career Decisions3


Rorie N. Harris, Ph.D.
Jacqueline A. Mottern, Ph.D.
Michael A. White, Ph.D.
David L. Alderton, Ph.D.
U. S. Navy Personnel Research, Studies and Technology, PERS-13
Millington, TN 38055-1300
jacqueline.mottern@navy.mil

Since December 2000, the U.S. Navy Reserve has tracked the career decisions of Selected
Reservists with a web-based survey, the Reserve Career Decision Support System. Designed by
Navy Personnel Research, Studies and Technology (NPRST), the survey relies on adaptive
question branching to keep questions relevant to the respondent, thus reducing respondent
burden. The web-based survey was designed for administration at transition points (e.g.,
retirement, promotion, mobilization, and demobilization). In addition to the questionnaire, the
system includes a near real-time query system available to commanders and career counselors
to answer questions about their commands. Through the Reserve Career Decision Support
System, the Navy Reserves are able to track the career decisions of Selected Reservists and
assess the impact of mobilization on those decisions.

INTRODUCTION
In order to maintain a qualified, productive workforce, it is necessary for organizations to
identify talented employees, train them effectively, and participate in actions and behaviors that
encourage the employee to remain with the organization. Both private and public sector
organizations, such as the military, face challenges in attracting, motivating, and retaining
competent employees. When highly skilled, qualified members leave, the organization suffers
losses in terms of talent, level of readiness, and the monetary costs associated with providing the
training they received. As such, researchers for the military continually attempt to identify and
examine those factors that influence whether or not a member chooses to stay or leave the
organization (Harris, 2003).

A key segment of the Navy personnel population that contributes greatly to the mission
of the Navy is the Naval Reserve. Members of the reserves are volunteers who are trained to
serve the expanded needs of the Navy, and make up almost 30% of the military personnel
serving with the Navy (U.S. Navy, 2003). The Naval Reserve relies on timely and accurate
retention and attrition statistics to guide its officer and enlisted personnel policies and programs.
In order to plan and manage accession, retention, separation and advancement targets in the
Naval Reserve, Naval Reserve planners, managers, and career counselors need accurate,
understandable, timely, easily accessible information on career decisions. The Naval Reserve

3
The opinions expressed are those of the authors. They are not official and do not represent the views of the U.S.
Department of Navy.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
200

has lacked a standardized means of collecting accurate, comprehensive reasons why personnel
are staying in and leaving the Reserves and on other critical separation and retention issues.

In an effort to address these issues, the U.S. Navy Personnel Research, Studies and
Technology Department (PERS-1), with the Naval Reserves as project sponsor, developed a
career decision support survey and query system. The career decision survey is a web-based
questionnaire administered at transition points (promotions, retiring, voluntary or involuntary
separations, mobilizing and demobilizing) in Selected Reservist’s (SELRES) careers. The web-
based query system is available to commands to access their data in near real-time.

THE RESERVE CAREER DECISION QUESTIONNAIRE AND QUERY SYSTEM


The questionnaire relies on extensive item branching to allow a maximum number of
items with a minimum response burden. The questionnaire adapts itself to each individual, based
on their demographics and responses to initial question topics. Answers to marital status,
dependents, rank and reason for completing the questionnaire serve as branching trunks. For
example, a SELRES who is single with no dependents will not see questions concerning spouse
or dependents. In addition, a series of 14 broad items (see Table 1) also serve as major branches.
For example, a SELRES who is mobilizing will only see questions related to mobilization.

Table 1. List of Item Branching Topics

Training/Promotion/Advancement Education and Other Benefits


Opportunities
Career Assignments Pay and Retirement
Command Climate Civilian Job Opportunities
Time Away From Home Mobilization
Recognition Navy Culture
(FITREPs/Evaluations/Awards) (Regulation/Discipline/Standards)
Maintenance and Logistic Support Navy Leadership
Current Job Satisfaction Personal and Family Life

The questionnaire also departs from the traditional use of satisfaction scales by using a 7-
point Likert-type scale that asks if an item has “influenced you (contributed to your decision) to
stay, influenced you to leave, or had no effect on your Naval Reserve career decision”. As
SELRES near completion of the survey, the computer generates a list of items they have
identified as strong influences to stay. Each SELRES then selects the five most important items
that are influencing their career decision. A similar list of influences to leave is generated as
well.

In FY03 we added a web-based query system for commanders and career counselors to
access for direct access to their command data, based on RUIC and echelon. Using drop-down

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
201

menus, a commander or career counselor can generate reports for all questions in the dataset,
except for SSN, and can run Stay-Leave reports (a list of the most important influences to stay
and to leave for officers and enlisted in order of frequency) for their commands compared to
Reserve wide data.

METHOD
Between December 1, 2000 and February 28, 2001, the U.S. Naval Reserve asked all
their SELRES to complete a version of the web-based questionnaire. The survey had a 71%
completion rate with a population of 71,300, thus serving as a baseline data collection. Within 30
days of ending data collection, a detailed briefing of these data was delivered. A revised
questionnaire was implemented May 2001 (N = 13,163) and included a section on the impact of
mobilization and demobilization experiences on SELRES careers.

The mobilization process offers many potential elements that could influence a member’s
retention decisions, including effects on family, pay and benefits, and mobilization jobs and
tasks. Influences on staying or leaving the reserves are of interest to those in positions to make
changes to the mobilization process and those aspects of mobilization that show levels of
influence on the retention decisions of reserve members. On the Career Decisions Survey,
questions regarding mobilization are designed to examine those aspects that influence a reserve
member to either remain in or leave the Navy Reserves. Questions cover a variety of general
topics, including the mobilization process in general, the gaining command, family issues, pay
and benefits, effects on civilian job, and willingness to extend mobilization term. Within each
topic, questions are rated on a scale from 1 (influence to leave) to 7 (influence to stay), with a
response of 4 representing no effect on leaving or staying in the Reserves.

RESULTS
Approximately 5,700 respondents were branched to respond to the questions on
mobilization. Currently, almost 90% of the Reservists who responded to the mobilization
questions are mobilized. A majority of these respondents (83%) reported having been mobilized
during the last year and 85% of the Reservists had not been mobilized previously.
Approximately three-fourths of the respondents did not volunteer for mobilization.

Gaining Command Issues


In examining issues with the gaining command that influence reserve members, most
aspects show higher influences on staying in the reserves or show no effect in either direction.
The gaining command assigned showed to be an influence to stay for 43% of the respondents
(see Fig. 1). Location of the gaining command (43%) also showed an influence on staying in the
reserves. 42% of the respondents reported that the treatment received by Active Duty sailors
influenced them to stay, and the job that was assigned on arrival was also an influence for
staying for almost 45% of the respondents.

The decision to leave was influenced by several different aspects of the gaining command
(Fig. 1). The morale of the gaining command was reported to be an influence to leave by almost
50% of the respondents. Another influence to leave is the leadership at the gaining command,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
202

with 48% of respondents indicating a negative influence. Approximately 40% of respondents


indicated that treatment at the gaining command was an influence to leave, although roughly the
same percentage of respondents reported their treatment as an influence to stay in the Reserves.

Gaining command assigned

Influence to leave
Location of gaining command
No Effect
Treatment by AD sailors Influence to stay

Job assigned on arrival

Morale of gaining command

Leadership at gaining command

Treatment at gaining command

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Figure 1. Gaining Command Questions: Influences to Stay and Leave the Reserves

Mobilization Process Issues


Questions regarding mobilization probed issues such as satisfaction with the mobilization
experience, satisfaction with mobilization assignments, and evaluation of job tasks. When asked
about how smooth the process was, almost 69% of respondents indicated that the process was at
least moderately smooth. In terms of task assignments, approximately 70% of respondents
characterized the tasked assigned to them as being interesting. Almost 65% of those who
responded reported that their mobilization job is/was relevant to their rank (see Fig. 2).

Assigned interesting
tasks
Yes
No
Mob job relevant to
rank

0% 20% 40% 60% 80% 100%

Figure 2. Task assignment ratings

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
203

Respondents were divided in their reports of the effect of the overall mobilization
experience on their career decisions, with 42% reporting their experience as an influence to stay
and 38% reporting it as an influence to leave (Fig. 3). The mobilization assignment received was
reported as an influence to stay by 43% of the respondents. Reports on aspects such as time
given to report to new command, the manner in which Reservists were notified about
mobilization and the time needed to get correct orders indicated that for a majority of
respondents, these issues had no strong influence on the decision to stay or the decision to leave
the Naval Reserves.

Overall mob
experience

Mob assignment
you received
Influence to leave
Time given to
No Effect
report
Influence to stay
Manner notified

Time get correct


orders

0% 20% 40% 60% 80% 100%

Figure 3. Mobilization Process Questions: Influences to Stay and Leave the Reserves

Family Related Issues


A final area on which to examine the effects of mobilization in terms of influencing
Reservists to stay or leave the organization is that of family related issues. 68% of respondents
indicated that they are married, and almost 60% of respondents reported that there are children
living in their household. A majority of respondents said that they saw their families once a
month or less. For the majority of respondents, benefits such as family use of the commissary,
the exchange, and Tricare indicated no effect or served as influences to stay in the Naval
Reserves. More than half of the Reservists who responded indicated that having to leave their
families for mobilization and the inability to move their families were influences for them to
leave the organization. Also, the fact that the family shows concern for the Reservists’ safety was
an influence to leave for approximately 43% of the respondents. Other influences to leave were
the effects of mobilization on children, as well as the additional stress that mobilization causes
for spouses (Fig. 4).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
204

Family use of commissary

Family use of exchange

Family use of TRICARE

Leaving family for mobilization Influence to leave


No Effect
Inability to move family Influence to stay
Family concern for your safety

Effect of mob on children

Additional stress of mob on spouse

0% 20% 40% 60% 80% 100%

Figure 4. Family Related Questions: Influences to Leave

Summary

Results from the information provided by mobilized Reservists indicate that there are
areas of interest that should be further examined by Reserve leaders. Areas showing higher
percentages of influence on leaving include family issues, such as separation from family during
mobilization, and effects on spouses and children. The command to which Reservists are
assigned also seem to show influence on the decision to remain, with leadership and morale
being rated as reasons to consider leaving the Reserves. The web-based survey and query system
reported here provide Reserve leadership an important tool for monitoring the Reserve force and
the effects of events such as mobilization and other personnel policies on Reserve career
decisions.

REFERENCES

Harris, R.N. (2003). Navy Spouse Quality of Life: Development of a model predicting spousal
support of the reenlistment decision. Unpublished doctoral dissertation. The University of
Memphis.

U.S. Navy. (2003). Status of the Navy report. Retrieved October 6, 2003, from
http://www.chinfo.navy.mil/navpalib/news/.www/status.html

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
205

Duties and Functions of a Recruiting Command Psychologist


LTC Stephen V. Bowles
U.S.Army Recruiting Center One

This paper discusses the duties and functions of the US Army Recruiting Command
(USAREC) Psychologist position that has applications to other national and international
recruiting psychologist positions, as well as other organizational psychologist positions.
This position serves as advisor to the USAREC Commanding General and senior
leadership on all areas of psychological issues, programs, projects, and initiatives in the
Command. The person in this position serves as liaison to the office of the Army Surgeon
General Psychology Consultant, Department of Defense, and civilian agencies regarding
psychological aspects of Soldiers in recruiting. This position oversees relevant screening
and selection projects for recruiting command and trains recruiters and leaders on
leadership, team building, and psychological principles at recruiting school and in the
field. This position also provides consultation on Soldiers’ and family members’ well-
being and command climate, and conducts research in recruiting and relevant areas to
enhance the screening, training, well-being, and performance of recruiters and leadership.
This position serves as a force multiplier to identify the best personnel to recruit and lead
while providing sustainment training.

The Command Psychologist serves as one of the members of the USAREC Commanding
General staff. In addition, the Command Psychologist serves as an advisor to all senior
leadership, on all areas of psychological issues, programs, projects, and initiatives in this
Command as well as other major Army Commands and Department of Defense agencies.
The person in this position serves as liaison to the Office of The Army Surgeon General
Psychology Consultant and interfaces with other military psychologists on recruiter
operational training and psychological research for Soldiers and leadership in recruiting.
The role of the Command Psychologist is to serve as the command advisor for screening,
assessment, well-being, and enhanced performance training. These programs will be
described over the course of this paper.

Recruiter and Leader Screening


The objective of the screening program is to develop research based programs to identify
the best personnel in the Army for recruiting and leadership. Past research that has been
overseen by the Command Psychologist in this area was a concurrent validation project
in the development of a recruiter screening test. Currently an ongoing predictive
validation study is being conducted. Most recently under the guidance of the Command
Psychologist, an Army Model web-based test program has been developed to screen all
soldiers on recruiting orders for recruiting. In this capacity, the Command Psychologist
served as the program manager directing Army staff from agencies in charge of
personnel, human resources, research, software development, and testing facilities. This
program has recommended a screening process that is currently under examination while
the testing process has been operationalized in Army facilities worldwide. The test will
undergo further research and development in the web-based phase for a couple more
years as the test data refines the scoring algorithm. The test will also be placed into a new
host testing system this year as the Army improves through continued advanced

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
206

technology. The focus of this process will be to continue to screen larger numbers of
soldiers from the applicant pool to enhance recruiter selection for the Army recruiting.

Recruiter Assessment
The purpose of the assessment board evaluation (ABE) is to evaluate marginal students
by reviewing their performance records. Recruiting students complete the recruiter
screening test prior to reporting to class or upon arriving to the United States Recruiting
and Retention School (RRS). These factors help to determine if the student should
continue at RRS. The focus of the assessment is on technical and interpersonal skills. A
future project for this program is to examine the use of developmental tests to provide
feedback to recruiters on interpersonal skills.

Recruiter and Station Commander Well-being


The word stress has become synonymous with recruiting. Eighty-four percent of
recruiters report that their stress levels are higher than their previous occupations. Since
the summer of 2002, stress and well-being material has been provided to recruiters to
alleviate this increased stress. Feedback to recruiters includes stress level, health habits,
alcohol, caffeine and cigarette use, social support network, Type A behavior, cognitive
hardiness, coping style, and psychological well being. To date, over 1,500 recruiters
have been provided with well-being feedback material. Consultation is also provided to
senior leadership on Soldiers’ and family members’ well-being and command climate.

Leader Coach Program


The objective of the Leader Coach Program is to enhance the performance of leaders,
increase production, reduce stress, improve well-being, and develop more capable
leaders, while focusing on station commanders. Training is done in the following phases:
assessment, development of a leader plan of action, two stress resilience training
sessions, recognition of personal and professional developmental goals, and individual
coaching at RRS and in the field for one year. Leaders volunteer for the program while
attending courses at the RRS or are Cadre at RRS. Currently the program has a 90+%
satisfaction rate for the program and coaching.

Enhanced Performance Program


This program provides classroom training for all station commanders (junior level
managers) on leadership development. The Enhanced Performance Program identifies
the leadership characteristics of station commanders, such as decisiveness,
expressiveness, self-confidence, extroversion, assertiveness, positive motivation, and
ability to mentor subordinates. Station commanders are provided with individualized
feedback forms with their classroom training. Upon graduating from training, station
commanders are trained by their first sergeants in the field who are provided with
coaching feedback forms of strengths and weaknesses of the station commanders.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
207

The in-house coaching provided to station commanders consists of three coaching


sessions. The focus of this coaching is on developing business and personal goals to
enhance production and quality of life of these leaders.
Center One and RRS staff members mentor or coach station commanders for three
sessions.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
208

THE 2002 WORKPLACE AND GENDER RELATIONS SURVEY


Anita R. Lancaster, Rachel N. Lipari, Lee M. Howell, and Regan M. Klein
Defense Manpower Data Center
1600 Wilson Blvd., Suite 400
Arlington, VA 22209-2953
liparirn@osd.pentagon.mil
Introduction
This paper provides results for sections of the Status of Armed Forces: Workplace and
Gender Relations Survey (2002 WGR). The United States Department of Defense (DoD) has
conducted three sexual harassment surveys of active-duty members in the Army, Navy, Marine
Corps, Air Force, and Coast Guard – in 1988, 1995, and 2002. The surveys not only document
the extent to which Service members report experiencing unwanted, uninvited sexual attention,
they also provide information on the details surrounding those events (e.g., where they occur),
and Service members’ perceptions of the effectiveness of sexual harassment policies, training,
and programs.

This paper examines the circumstances in which unprofessional, gender-related behaviors


occur as reported in the 2002 survey.4 Service members who experienced at least one
unprofessional, gender-related behavior were asked to consider the “one situation” occurring in
the year prior to taking the survey that had the greatest effect on them. Members then reported
on the circumstances surrounding that experience. Specifics related to the situation provided
answers to questions such as: (1) What was the unprofessional, gender-related experience, (2)
Who were the offenders, (3) Where did the experience occur, (4) How often did the situation
occur, (5) How long did the situation last, (6) Was the situation reported, and if so, to whom, and
(7) Were there any repercussions as a result of reporting the incident?

This paper will analyze gender differences in the circumstances surrounding the one
situation, reporting behaviors, and problems at work resulting from unprofessional, gender-
related behavior. In addition, this paper will include an analysis of paygrade by gender
differences. Members in the W1-W5 paygrade are not presented or analyzed in this paper
because estimates would be unstable due to low cell size. Only differences for statistically
significant numbers are presented in this paper.

Survey Methodology
The population of interest for the survey consisted of all active-duty members of the
Army, Navy, Marine Corps, Air Force, and Coast Guard, below the rank of admiral or general,
with at least 6 months of active duty service. The sampling frame included Service members
who were on active-duty in May 2001, with eligibility conditional on their also being on active
duty in September 2001 and December 2001.

The sample consisted of 60,415 Service members. A total of 19,960 eligible members
returned usable surveys yielding an adjusted weighted response rate of 36%. Data were
collected by mail and Web between December 26, 2001 and April 23, 2002. Data were weighted
to reflect the active duty population as of December 2001. The nonresponse-adjusted weights
were raked to force estimates to known population totals of the midpoint of data collection.

4
Comparisons are made to the 1995 survey but not to the 1988 survey as it was substantially different.
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
209

Metrics of Unprofessional, Gender-Related Behavior


The 2002 WGR contains 19 behaviorally-based items intended to represent a continuum
of unprofessional, gender-related behaviors—not just sexual harassment—along with an open
item for write-in responses of “other gender-related behaviors.” The 18 question sub-items can
be grouped into three primary types of behaviors: (1) Sexist Behavior, (2) Sexual Harassment,
and (3) Sexual Assault. The sexual harassment behaviors can be further categorized as: (1)
Crude/Offensive Behavior, (2) Unwanted Sexual Attention, and (3) Sexual Coercion. The 12
sexual harassment behaviors are consistent with the U.S. legal system’s definition of sexual
harassment (i.e., behaviors that could lead to a hostile work environment and others that
represent quid pro quo harassment). Service members were asked to indicate if any of these
behaviors happened to them in the past 12 months. The rates of unprofessional, gender-related
behaviors are based on these items. However, details are not asked regarding all behaviors.
Rather, details are obtained about a specific situation only from those who had experienced some
behaviors in the past year. Service members were asked to pick the one situation that had the
greatest effect on them from the list of 19 unprofessional, gender-related behaviors. Service
members were asked to indicate, in the situation that affected them most, whether the offender
“did this” or “did not do this” for each item. Those analyzed in this paper represent those
members who experienced at least one behavior and chose to answer the questions pertaining to
the one situation with the greatest effect.

Results
Types of Behaviors
Figure 1 shows that in 2002, over half of the women and one-third of the men indicated
that multiple types of behaviors occurred in the one situation, with the remainder of them
reporting only that Sexist Behavior, Crude/Offensive Behavior, or Unwanted Sexual Attention
occurred. Both women and men reported experiencing Sexual Coercion and Sexual Assault only
in combination with other behaviors. In 2002, Sexist Behavior was the most commonly
experienced type of behavior occurring alone for women (26%), whereas Crude/Offensive
Behavior was most commonly experienced by men (48%). While levels were different in 1995
with fewer combinations, the overall pattern was very similar.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
210

Figure 1
Percentage Distribution of Behaviors in One Situation, by Gender and Year

2002 Female 26 10 7 57

1995 Female 38 15 16 31

2002 Male 15 48 4 33

1995 Male 20 53 12 15

0 10 20 30 40 50 60 70 80 90 100
Sexist Behavior Crude/Offensive Behavior
Unwanted Sexual Attention Combination of Behaviors

Margin of error does not exceed ±4

Gender of Offenders
To obtain information on the perpetrators of unprofessional, gender-related behavior,
Service members were asked about the identity of the offender(s) in the situation that had the
greatest effect on them. It should be noted that it is possible for there to be multiple offenders
during the one situation.

Given the gender make-up of the active-duty military, 85% male and 15% female, it is
not unexpected that the majority of women (85%) and men (51%) reported their offender as
male. Comparing 2002 to 1995, more women (14% vs. 6%) and men (27% vs. 16%) reported
that the offenders included both genders (see Figure 2). The complementary change for women
and men were in the percentages who said the offenders were solely of the opposite gender.

Figure 2
Gender Percentages of Reported Offenders in One Situation, by Year

2002 Female 85 1 14

1995 Female 92 2 6

2002 Male 51 22 27

1995 Male 52 32 16

0 10 20 30 40 50 60 70 80 90 100

Male Female Both Males and Females

Margin of error does not exceed ±4

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
211

Regardless of paygrade, both women and men reported most often the gender of the
offenders as male (see Table 1). With the exception of senior officers, across paygrades, roughly
twice as many women and men reported the offenders included both men and women in 2002
than in 1995.

Table 1
Percentage of Reported Offenders in One Situation, by Gender, Paygrade, and Year
Junior Enlisted Senior Enlisted Junior Officer Senior Officer
(E1-E4) (E5-E9) (O1-O3) (O4-O6)
1995 2002 1995 2002 1995 2002 1995 2002
Females
Men 92 85 92 83 92 89 93 89

Women 2 1 1 1 3 2* 1 2*

Both 6 14 7 16 5 9 5 9

Males
Men 53 53 51 47 57 62 51 51

Women 32 20 32 22 33 17 33 29

Both 15 26 17 30 10 21 17 20

Margin of Error ±5 ±6 ±6 ±4 ±9 ±8 ± 11 ±8
* Low precision and/or unweighted denominator size between 30 and 59.

Organizational Affiliation of Offenders


Another characteristic of interest regarding perpetrators of unprofessional, gender-related
behavior is his/her organizational affiliation. Service members interact with both other military
personnel and civilians of various paygrades, therefore the perpetrators of unprofessional,
gender-related behaviors can be found in both groups. Service members were asked to identify
whether or not the offenders in the situation that had the greatest effect on them were military
members and/or civilians. Offenders were categorized as: military members, civilians, or both
military and civilian personnel.

Given that during duty hours Service members are more likely to interact other military
personnel than non-military personnel (excluding family members), it was expected that the
majority of both women (84%) and men (82%) reported the offenders in the situation that had
the greatest effect on them were Service members (see Figure 3). Both women (84% vs. 82%)
and men (82% vs. 78%) were less likely in 2002, than in 1995, to report the offenders included
only civilians (see Figure 3).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
212

Figure 3
Percentage of Offenders’ Organizational Affiliation

2002 Female 84 12 4

1995 Female 82 12 6

2002 Male 82 12 6

1995 Male 78 9 13

0 10 20 30 40 50 60 70 80 90 100

Military Only Both Military and Civilian Civilian Only

Margin of error does not exceed ±4

Female (68% vs. 82-88%) and male (57% vs. 82-87%) senior officers were the least
likely to report the offenders were military members (see Table 2). The complementary findings
for both female (14% vs. 3-6%) and male (23% vs. 2-7%) senior officers were in the percentages
who said the offenders were solely civilians.

Table 2
Percentage of Offenders’ Organizational Affiliation, by Paygrade
Junior Enlisted Senior Enlisted Junior Officer Senior Officer
(E1-E4) (E5-E9) (O1-O3) (O4-O6)
F M F M F M F M
Military only 88 87 82 80 83 82 68 57

Both military and civilians 10 11 13 14 11 12 17 20

Civilians only 3 2 5 7 6 7 14 23

Margin of Error ±2 ±4 ±2 ±4 ±4 ±6 ±5 ±8

Place and Time One Situation Occurred


Members were asked questions to describe the characteristics of the one situation with
the greatest effect. To understand this section, it is necessary to remember that these behaviors
can happen in various locations during multiple times in one single day, and can also span over a
long period of time. Through examining these characteristics, it is possible to identify
commonalities between incidents of unprofessional, gender-related behavior.

The majority of women and men reported some or all of the behaviors occurred at an
installation (Females 86%; Males 75%), at work (Females 81%; Males 74%); during duty hours
(Females 84%; Males 76%) (see Tables 59a.1-59d.4 in Greenlees et al. (2003)). Approximately
twice as many women than men (13% vs. 24%) reported none of the behaviors occurred on a

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
213

military installation. However, women and men were less likely to report in 2002 than in 1995
that all of the behaviors in the situation occurred during duty hours (Females 46% vs. 54%;
Males 40% vs. 48%) and either on a military installation (Females 51% vs. 73%; Males 42% vs.
62%) or at work (Females 44% vs. 51%; Males 39% vs. 51%) (See Table 3).

Among women, junior enlisted members (37% vs. 49%-61%) were the least likely to
report that all of the behaviors occurred at their work (see Table 3). In contrast, female senior
officers were the most likely to report that all of the behaviors occurred at work (61% vs. 37-
50%). Similarly, among women, junior enlisted members (39%) were the least likely, and senior
officers (63%) were the most likely, to report that none of the behaviors occurred during duty
hours. This may be partially explained by the findings that, among women, junior enlisted
members (62%) were the least likely, and senior officers (83%) were the most likely, to report
that none of the behaviors occurred in the local community by an installation (see Tables 59a.4-
59d.4 in Greenlees et al. (2003)). For men, there were no significant paygrade differences.

Regardless of paygrade, women were at least 15-percentage points less likely to report in
2002 than in 1995 that all of the behaviors occurred on a military installation (see Table 3).
Regardless of gender, senior enlisted members were less likely than members in other paygrades
to report in 2002 than in 1995 that all of the behaviors occurred at work (Females 50% vs. 57%;
Males 39% vs. 56%), or during duty hours (Females 53% vs. 62%; Males 40% vs. 52%).
Moreover, junior (43% vs. 57%) and senior (40% vs. 66%) enlisted men were less likely to
report in 2002 than in 1995 that all of the behaviors occurred on a military installation (see Table
3).

Table 3
Percentage of Members Reporting all of the Behaviors Occurred at a Particular Time or
Location, by Gender
Junior Enlisted Senior Enlisted Junior Officer Senior Officer
Total DoD
(E1-E4) (E5-E9) (O1-O3) (O4-O6)
1995 2002 1995 2002 1995 2002 1995 2002 1995 2002
Females

In the local community --- 5 --- 6 --- 5 --- 5 --- 4

At a military installation 73 51 70 47 76 45 71 53 76 61

At work 51 44 45 37 57 50 57 49 69 61

During duty hours 54 46 45 39 62 53 59 51 73 63

Margin of Error ±2 ±2 ±3 ±3 ±3 ±3 ±4 ±5 ±6 ±5
Males

In the local community --- 5 --- 4 --- 5 --- 7 --- 8

At a military installation 62 42 57 43 66 40 62 47 61 50

At work 51 39 44 38 56 39 55 44 58 47

During duty hours 48 40 40 38 52 40 56 46 58 50

Margin of Error ±4 ±3 ±5 ±5 ±6 ±4 ±9 ±8 ± 11 ±8

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
214

Frequency and Duration of Incidents Concerning Unprofessional, Gender-related Behavior


Regarding the frequency and duration of incidents of unprofessional, gender-related
behavior, women were less likely than men to report that such incidents had only happened once
(22% vs. 32%) and that it lasted less than a month (45% vs. 60%) (see Tables 4 and 5).

Among women, junior enlisted members were the most likely to report that the incidents
of unprofessional, gender-related behavior occurred almost every day or more than once a day
(9% vs. 1-5%) (see Table 4). Among men, there were no paygrade differences in the frequency
of behaviors. Regardless of gender, there were no paygrade differences in the duration of the
situation (see Table 5).

Table 4
Percentage of Members Reporting Frequency of Behaviors, by Paygrade
Total DoD Junior Enlisted Senior Enlisted Junior Officer Senior Officer
(E1-E4) (E5-E9) (O1-O3) (O4-O6)
F M F M F M F M F M

Once 22 32 21 29 23 35 25 33 27 38
Occasionally 52 50 50 46 53 53 56 57 55 54
Frequently 17 11 19 16 17 8 15 9 14 3
Almost everyday/More
9 6 11 9 8 5 4 1 4 5
than once a day
Margin of Error ±2 ±3 ±3 ±5 ±3 ±5 ±5 ±8 ±5 ±8

Table 5
Percentage of Members Reporting Duration of the Situation, by Paygrade
Total DoD Junior Enlisted Senior Enlisted Junior Officer Senior Officer
(E1-E4) (E5-E9) (O1-O3) (O4-O6)
F M F M F M F M F M
Less than 1 month 45 60 43 55 46 62 52 64 45 65
1 month to less than 6 27 17 30 19 24 16 25 15 20 15
More than 6 months 28 23 27 25 30 22 23 21 35 21
Margin of Error ±2 ±3 ±3 ±5 ±3 ±4 ±5 ±8 ±8 ±5

Reporting
A series of survey questions asked Service members to provide details regarding
reporting and to give details about various aspects of the reporting process. Overall, 30% of
women and 17% of men reported the situation to a supervisor or person responsible for follow-
up (see Table 6). However, in 2002, fewer women reported behaviors than in 1995 (38% vs.
30%). For more details see Tables 66a.3-66e.3 in Greenless et al. (2003).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
215

Table 6
Frequency of Reporting Behavior in One Situation to Any Supervisor or Person Responsible
for Follow-up
Females Males
1995 38 15
2002 30 17
Margin of Error ±3 ±3

To Whom Behaviors Are Reported


Less than 10% of women and men chose to report unprofessional, gender-related
behavior to either a special military office responsible for these types of behaviors, or to another
installation/Service/DoD official. Rather, female and male Service members tend to report to
members in their chain-of-command, such as their immediate supervisor (Females 21%; Males
12%), or to the supervisor of the offender (Females 16%; Males. 10%) (see Tables 66a.1-66e.4
in Greenlees et al. (2003)). Among women, enlisted members were more likely than officers to
report unprofessional, gender-related behavior to someone in their chain-of-command (15-17%
vs. both 10%) or to a special military office responsible for these types of behaviors (7-8% vs.
both 3%) (see Tables 66a.4-66e.4 in Greenlees et al. (2003)).

Reasons for Not Reporting Behaviors


Service members were asked to indicate which of the 19 items explained why they chose
not to report any or all of the behaviors they experienced. The five reasons Service members
most frequently indicated for not reporting behaviors are shown in Table 7. Women (67%) and
men (78%) most often indicated that they did not report behaviors because they felt the situation
was not important enough to report. For detailed information on all 19 items, see Tables 74a.1-
74s.4 in Greenless et al. (2003).

Table 7
Top Five Reasons for Not Reporting Any or All Behaviors in One Situation
Females Males
Was not important enough to report 67 78
You took care of the problem yourself 65 63
You felt uncomfortable making a report 40 26
You did not think anything would be done if you reported 33 28
You thought you would be labeled a troublemaker if you reported 32 22
Margin of Error ±2 ±3

Junior enlisted women were more likely than other women to indicate they did not report
behaviors because they felt uncomfortable (48% vs. 30-36%), thought they would not be
believed (22% vs. 11-16%), thought coworkers would be angry (31% vs. 16-20%), did not want
to hurt the person (34% vs. 16-26%), or were afraid of retaliation from the offender (28% vs. 18-
19%) (see Tables 74a.1-74s.4 in Greenlees et al. (2003)). In contrast, more junior enlisted men
indicated they did not report because it would take too much time (29% vs. 11-17%).

Reasons for Not Reporting Behaviors by Reporting Category


For those Service members who reported either none of the behaviors or only some of the
behaviors, this section includes an analysis of Service members’ reasons for not reporting

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
216

behaviors. Women were more likely than men to identify retaliatory behaviors as reasons not to
report any of the behaviors (see Table 8). These reasons included: being labeled a troublemaker
(29% vs. 19%), fear of retaliation from the offender (18% vs. 10%), fear of retaliation from
friends of the offender (13% vs. 8%), and fear of retaliation from their supervisor (12% vs. 8%).
Men were more likely than women to report either none (81% vs. 71%), or only some (59% vs.
50%) of the behaviors because they believed the behaviors were not important enough to report.

Table 8
Percentage of Reasons for Not Reporting the Behaviors, by Gender and Reporting Category
Reasons For Not Reporting Reported No Reported Some
Behaviors Behaviors
F M F M
Was not important enough to report 71 81 50 59
You did not know how to report 13 9 26 21
You felt uncomfortable making a report 37 24 53 48
You took care of the problem yourself 67 63 57 58
You talked to someone informally in your chain-of-command 10 8 70 62
You did not think anything would be done if you reported 30 24 46 47
You thought you would not be believed if you reported 15 10 28 25
You thought your coworkers would be angry if you reported 23 17 29 33
You wanted to fit in 15 14 19 21
You thought reporting would take too much time and effort 23 21 28 29
You thought you would be labeled a troublemaker if you reported 29 19 45 48
A peer talked you out of making a formal complaint 2 1 10 10
A supervisor talked you out of making a formal complaint 1 1 16 14
You did not want to hurt the person’s feelings, family, or career 28 20 32 34
You thought your performance evaluation or chance of promotion would suffer 14 10 28 31
You were afraid of retaliation from the person(s) who did it 18 10 39 30
You were afraid of retaliation/reprisals from friends of the person(s) who did it 13 8 26 29
You were afraid of retaliation/reprisals from your supervisors 12 8 26 26
Some other reason 22 18 25 27
Margin of Error ±3 ±4 ±5 ±11

Satisfaction With Complaint Outcome


Satisfaction with the outcome of the complaint can be indicative of a Service member’s
perception of the reporting and complaint process. Approximately a third of women and men
were satisfied with the outcome of their complaint. This trend remained consistent across years,
as women (34% vs. 36%) and men (37% vs. 36%) were equally satisfied with the outcome of the
complaint process in 2002 and in 1995 (see Tables 72.1-72.3 in Greenlees et al. (2003).

Complaint Outcome
This section includes an analysis of the outcome of the complaint by Service members’
satisfaction with the outcome. As expected, Service members were most likely to be satisfied
with the outcome of their complaint when the situation was corrected (Females 92%; Males
91%), the outcome of complaint was explained to them (Females 69%; Males 70%), and some

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
217

action was taken against the offender (Females 55%; Males 66%). Women and men (both 48%)
were most likely to be dissatisfied with the outcome of their complaint when nothing was done
about it. For both women and men, satisfaction with the complaint outcomes was not predicated
totally on whether the complaint was found to be true (see Table 9). For more detailed paygrade
findings regarding complaint outcomes, see Tables 71a.1-71h.4 in Greenlees et al. (2003).

Table 9
Percentage of Complaint Outcome, by Satisfaction with Outcome and Gender
Outcome of Complaint Satisfied with Dissatisfied
Outcome with Outcome
F M F M
They found your complaint to be true 78 85 33 48
They found your complaint to be untrue 0* 0* 14 5
They were unable to determine whether your complaint was true or not 8 6* 12 14
The outcome of your complaint was explained to you 69 70 20 22
The situation was corrected 92 91 12 12
Some action was taken against the person(s) who bothered you 55 66 14 4
Nothing was done about the complaint 9 10* 48 48
Action was taken against you 0* 6* 19 17
Margin of Error ±6 ±11 ±6 ±16
* Low precision and/or unweighted denominator size between 30 and 59.

Problems at Work
Service members were asked to describe problems they have had at work as a result of
their experience or how they responded to it. These problems can include both social (e.g.,
hostile interpersonal behaviors) and professional (e.g., behaviors that interfere with their career_
reprisals. Overall, 29% of women and 23% of men reported experiencing some type of problem
at work as a result of unprofessional, gender-related behavior (see Figure 4). Women and men
most often reported being gossiped about by people in an unkind way (15% and 20%). Women
were more likely than men to report experiences of being ignored or shunned by others at work
(10% vs. 6%), blamed for the situation (9% vs. 6%), or mistreated in some other way (10% vs.
6%) (see Tables 75a.4-75l.4 Greenlees et al. (2003)).

Both junior enlisted women (33%) and men (31%) were more likely to report
experiencing at least some kind of problem at work than members in other paygrades (see Figure
4). Junior enlisted women (15% vs. 9-18%) and men (21% vs. 5-11%) were the most likely to
report experiencing unkind gossip (see Tables 75a.4-75l.4 Greenlees et al. (2003)).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
218

Figure 4
Percentage of Members Who Experienced Any Problems at Work, by Gender and Paygrade
100
Males
90
80 Females
70
60
50
40 31 33
29 27
30 23 20 21
18
20 11 10
10
0
DoD Total E1-E4 E5-E9 O1-O3 O4-O6

Margin of error ±5

Conclusions
This paper provides an overview of the characteristics of the situations of unprofessional,
gender-related behavior. By analyzing the characteristics, the DoD can better target areas and
individuals who are affected by these behaviors and implement strategies to reduce occurrences.

Of those who experience unprofessional, gender-related behavior, their experiences vary


and often include multiple types of behaviors. However, some characteristics are common
across most experiences. The majority of the offenders are male, although the perpetrators
increasingly include both genders. For example, in 2002 more women and men reported that the
offenders included both genders than in 1995. Another common characteristic is that the
majority of women (84%) and men (82%) reported the offenders were military personnel.

When Service members report their experiences, there are more opportunities to address
these problems. Therefore, an analysis of various reporting factors is important to the process of
attending to these issues and more effectively resolving their troublesome effects. Overall, 30%
of women and 17% of men reported the situation. These rates are a concern because in this
analysis, reporting was not limited to formal reports; rather it included informal discussion with
any installation, Service, or DoD individual or organization. Hence, for formal reporting, it
would be expected that the numbers would be even lower. It is important to note that whether or
not Service members report may also be a function of the types of behaviors they experience.
The unprofessional, gender-related behaviors measured in the 2002 WGR represent a continuum
of behaviors ranging from Sexist Behavior, which, although considered a precursor to sexual
harassment, is not illegal, to Sexual Assault, a criminal offense. One explanation for low
reporting rates may seem low is that the most commonly noted reason for not reporting for
women (67%) and men (78%) is that the situation was not important enough to report.

References
Greenlees, J.B., Deak, M.A., Rockwell, D., Lee, K.S., Perry, S., Willis, E.J., & Mohomed, S.G.
(2003). Tabulations of Responses from the 2002 Status of the Armed Forces Survey—
Workplace and Gender Relations: Volume 2 Gender Related Experiences in the Military and
Gender Relations. DMDC Report No. 2003-013. Arlington, VA: DMDC.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
219

Workplace Reprisals: A Model of Retaliation Following Unprofessional


Gender-Related Behavior1
Alayne J. Ormerod, Ph.D. and Caroline Vaile Wright,
University of Illinois
603 East Daniel Street
Champaign, IL 61820
aonnerod@s.psych.uiuc.edu
Retaliation is considered to be both a deterrent to and consequence of reporting sexual
harassment. Existing research suggests that reporters of harassment routinely experience retaliation
and that reporting worsens outcomes beyond that of harassment alone (Coles, 1986; Hesson-
McInnis & Fitzgerald, 1997; Loy & Stewart, 1984; Stockdale, 1998). Interestingly, the relationship
between reporting and outcomes may be indirect, that is, reporting appears to trigger "post-
reporting" variables that exert a negative influence on outcomes (Bergman, Langhout, Cortina,
Palmieri, & Fitzgerald, 2002). The goal of this paper is to examine one such post-reporting
variable, retaliation that follows from experiences of unprofessional, gender-related behavior, in a
sample of military personnel.
UNPROFESSIONAL, GENDER-RELATED BEHAVIOR IN THE MILITARY
Retaliation occurs in the context of sexual harassment and other unprofessional, gender-related
behavior (UGRB),2 thus it is important to consider that context. In an issue of Military Psychology
entirely devoted to sexual harassment in the Armed Forces (Drasgow, 1999), Fitzgerald, Drasgow,
& Magley (1999) tested a model that provided an integrated framework for understanding the
predictors and outcomes of sexual harassment. Their findings, applicable to both male and female
personnel, suggest that harassment occurs more often when personnel perceive that leadership
efforts, practices, and training do not address issues of harassment and when work groups are not
gender-integrated. Experiences of harassment were associated with decrements in job satisfaction
and psychological and physical well being. In turn, lowered job satisfaction was associated with
lowered organizational commitment and work productivity. These relationships also apply to the
civilian workforce (Fitzgerald, Drasgow, Huhn, Gelfand, & Magley, 1997). Given that UGRB is
the major stimulus for retaliation, the general model is a starting point for examining those
variables considered antecedents and outcomes of retaliation.
1
This paper is part of a symposium, entitled Sexual Harassment in the Military: Recent Research Findings,
presented at the 2003 International Military Testing Association Conference in Pensacola, Florida (T. W.
Elig, Chair). This research was supported in part by the Defense Manpower Data Center (DMDC) through
the Consortium of Universities of the Washington Metropolitan Area, Contract. M67004-03-C-0006 and also
by NIMH grant # MH50791-08. The opinions in this paper are those of the authors and are not to be
construed as an official DMDC or Department of Defense position unless so designated by other authorized
documents. The authors wish to thank Louise F. Fitzgerald for her comments.
2
Survey measurement of sexual harassment is defined by the U.S. Department of Defense as the presence of
behaviors indicative of sexual harassment (Crude/Offensive Behavior, Sexual Coercion, and Unwanted
Sexual Attention; Sexist Behavior and Sexual Assault are not counted in the DoD survey measure of sexual
harassment) and the labeling of those behaviors as sexual harassment (Survey Method for Counting Incidents
of Sexual Harassment, 2002). In this paper we examine behaviors indicative of sexual harassment and sexist
behavior and refer to them together as unprofessional, gender-related behavior (UGRB). We use the term
sexual harassment to refer to the existing literature.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
220

PROCESS OF RETALIATION
There are currently only a handful of empirical studies that examine the correlates of
retaliation following sexual harassment, however, the literature on whistle-blowing offers insight
into the retaliatory process. Miceli and Near (1992) conceptualize retaliation as a collection of
punitive work-related behaviors (e.g., poor performance appraisal, denial of promotion or other
advancement opportunities, transfer to a different geographic location, suspension, demotion),
both experienced and threatened. They consider the number and status (e.g., coworkers,
supervisors) of those who are engaged in the retaliation (Near & Miceli, 1986) and suggest that
retaliation from coworkers and management may differ, as each may be a function of dissimilar
variables (Miceli & Near, 1988; 1992).
Near, Miceli, and colleagues (Miceli & Near, 1992; Miceli, Rehg, Near, & Ryan, 1999;
Near & Miceli, 1986) describe retaliation as a phenomenon arising from an employee's
disclosure of some form of organizational wrong-doing to someone who can take action. They
propose that retaliation can be best understood from the perspective of resource dependency
theory, that is, when an organization is dependent on the wrong-doing, it is more likely to resist
change and engage in retaliation toward the (relatively less powerful) whistle-blower. Further,
they suggest that the determinants of retaliation may be sensitive to context, such as the type of
wrong-doing, the organizational setting, and victim variables such as whether the victim of the
retaliation was also the victim of the wrong-doing (Miceli & Near, 1992).
Sexual harassment is a type of wrong-doing in which the "whistle-blower," or reporter, is
almost always the target of the wrong-doing. Recent research about the nature of retaliation
following sexual harassment suggests that retaliation can include both personal (e.g., isolating
and targeting victims of harassment with hostile interpersonal behaviors) and professional (e.g.,
behaviors that interfere with career advancement and retention) reprisals that may contribute
differentially to outcomes (Cortina & Magley, in press; Fitzgerald, Smolen, Harned,
Collinsworth, & Colbert, in preparation). Although these studies do not directly assess the source
of the retaliation, it is likely that professional retaliation results largely from actions by a
supervisor or other person in a more powerful position than the target. Professional retaliation is
prohibited by Title VII of the Civil Rights Act of 1964 (Crockett & Gilmere, 1999) and is
thought to be less common than social retaliation. Social forms of retaliation, on the other hand,
can arise from any individual with whom the target interacts, including coworkers. Social
retaliation has been linked to negative outcomes (Cortina & Magley, in press) but is not
explicitly prohibited by law.
Retaliation following sexual harassment is thought to arise from a general process
through which an employee is (a) victimized by another organizational member, (b) makes an
external response (e.g., support seeking, confrontation, or reporting the mistreatment), (c) is
retaliated against by an organizational member, either the original perpetrator or others, and (d)
subsequently suffers negative consequences (Cortina & Magley, in press; Fitzgerald et al., in
preparation; Magley & Cortina, 2002).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
221

INCIDENCE OF RETALIATION
Studies define whistle-blowing (that is, response to wrong-doing) in slightly different
ways, some include all forms of active response and others limit the notion to official reporting.
The whistle-blowing literature suggests that less than one quarter of those who report wrong-
doing experience retaliation (Miceli & Near, 1988; Near & Miceli, 1986). Other research
suggests that formally filing charges of sexual harassment or discrimination is associated with a
retaliation rate of between 40% and 60% (Coles, 1986; Loy & Stewart, 1984; Parmerlee, Near, &
Jensen, 1982). In a study of military personnel who experienced sexual harassment and reported
it to someone in an official position, 15.7% of the men and 19.4% of the women reported some
form of retaliation (Magley & Cortina, 2002). These disparate findings suggest that incidence
rates may vary depending on the type organization, the nature of the wrong-doing, the target's
response, and the gender of the target. They also lead us to conclude that there is, at present, no
reliable general estimate of the extent of retaliation.
ANTECEDENTS OF RETALIATION
Organizational Climate
In the sexual harassment research literature, organizational climate regarding sexual
harassment has been linked to both harassment and negative outcomes (Fitzgerald, Hulin, &
Drasgow, 1995; Hunter Williams, Ftizgerald, & Drasgow, 1999; Pryor, Geidde, & Williams,
1995; Pryor & Whalen, 1997). Miceli and Near (1992) argue that organizations who engage in
one type of wrong-doing will engage in others and that retaliation against whistle-blowers is just
one type of unfair practice found within an organization. Supporting this line of reasoning, they
reported that employee perceptions that an organization's reward distribution system is fair were
negatively associated with retaliation (Miceli & Near, 1989). Thus it is reasonable to suppose
that a climate that tolerates sexual harassment will also be tolerant of retaliation. Two recent
studies support this contention and reported an association between climate and experiences of
retaliation for male and female military personnel. In one study, implementation of policies
prohibiting harassment (an indicator that the organization does not tolerate harassment) was
associated with less frequent retaliation (Magley & Cortina, 2002). In the second study, an
organizational climate that tolerates harassment was one of several predictors of retaliation for
female personnel and the sole predictor for male personnel (Bergman et al., 2002). In addition,
this same study found that retaliation of female personnel was associated with working in a
highly masculinized job context and having perpetrators of a higher status.
Unprofessional, Gender-Related Behavior
In the whistle-blowing literature, the relationship between the frequency of wrong-doing
and retaliation is not entirely clear-cut; it is associated in some research (Miceli, Rehg, Near, &
Ryan, 1999) but not in others (Near & Miceli, 1986). In the sexual harassment literature, on the
other hand, the frequency of experiencing harassment is directly related to increased experiences
of retaliation. In a study of sexually harassed male and female federal court employees who had
confronted the harasser, reported, or sought social support, more frequent harassment and
interpersonal mistreatment predicted both personal and professional retaliation (Cortina &
Magley, in press). Male and female military personnel who endorsed more frequent experiences
of harassment also endorsed more experiences of retaliation (Magley & Cortina, 2002). In
another study, female military reporters who experienced more frequent sexual harassment
experienced more retaliation (Bergman et al., 2002).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
222

Findings are mixed concerning whether the severity of the wrong-doing predicts increases
in retaliation. Near and Miceli (1986) reported that retaliation was related to the seriousness of the
wrong-doing but later argued that severity of retaliation can not be reliably measured across
contexts because contextual and individual variables make a standard hierarchy of severity difficult
to determine. (Miceli & Near, 1992). Magley and Cortina (2002) attempted to model severity by
assigning harassing behaviors to hostile environment, or hostile environment plus quid pro quo
categories. They found a small but significant relationship between harassment that included quid
pro quo behaviors and increased experiences of retaliation; however they note that differences
between retaliation associated with hostile environment experiences and those associated also with
quid pro quo were minimal, suggesting that retaliation is associated with both types of
harassment. In a second study, Cortina and Magley (in press) dichotomized work mistreatment into
either incivility alone or incivility with sexual harassment (arguably a more severe type of wrong-
doing) but found no relationship between type and retaliation. Thus, research attempting to link
severity of wrong-doing to retaliation has, to date, been largely unsuccessful.
Primary and Secondary Appraisal
The target's subjective assessment of whether the harassing event was stressful or
threatening and their subsequent responses may prove useful when attempting to better
understand the relationship of wrong-doing to retaliation. We draw on Lazarus and Folkman's
(1984) cognitive stress framework and its application to sexually harassing events (Fitzgerald,
Swan, & Fischer, 1995; Fitzgerald, Swan, & Magley, 1997) to explore the role that appraisal
plays in the process of retaliation following from an unprofessional, gender-related event.
Primary appraisal is the cognitive evaluation of an event to determine whether it is "stressful,"
whereas secondary appraisal is the process of determining a response to the event or stressor
(Lazarus & Folkman, 1984). Appraisal is thought of as a complex process, influenced by
multiple determinants that can change as the stressful event changes. In the case of sexual
harassment and other UGRB, influencers are thought to include individual, contextual, and
objective factors. Fitzgerald, Swan et al. (1997) consider that the stressfulness of a harassing
event inheres in the appraisal of the event (as stressful) rather than in the event itself and suggest
that the frequency, intensity, and duration of the harassment, the victim's resources and personal
attributes, and the context (e.g., organizational climate, gender context) all make up evaluations
of whether the event is stressful or threatening. Appraisal, in turn, is linked to decisions about
response and to outcomes.
Secondary appraisal, or coping, is thought of as attempts to manage both the stressful
event and one's cognitive and emotional reactions to the event (Fitzgerald, Swan et al., 1995).
Responses to UGRB can include reporting the behavior, confronting the person, seeking social
support, behavioral avoidance (e.g., avoid the person), and cognitive avoidance (e.g., pretend not
to notice, try to forget).3 Reporting, the most studied form of coping has been linked to
retaliation to the reporter (Bergman et al., 2002; Hesson-McInnis, 1997), particularly when
reporting wrong-doing outside of one's organization (Near & Miceli, 1986) or when reporting
harassment to multiple people in official positions (Magley & Cortina, 2002). Confronting a
harasser is associated with increased retaliation (Cortina & Magley, in press; Stockdale, 1998).
Indeed, complaining about the retaliation itself can result in further retaliation (Near & Miceli,
1986). The type of response can also interact with the status of the target and the harasser. In a
sample of federal court employees, work-related retaliation increased when the target confronted
offenders who held more organizational power, and personal retaliation increased when the
harasser was more powerful and the target sought social support (Cortina & Magley, in press).
3
See Fitzgerald, Swan et al. (1995) for a detailed description of coping responses.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
223

Far less is known about the consequences of cognitive or behavioral avoidance. These
less direct behaviors are far more common than confrontation or reporting, the two most often
recommended behaviors in response to harassment. It is not unreasonable to consider whether
common behaviors, such as ignoring the UGRB, can also trigger a retaliatory response.
Organizational Power
Findings concerning the status of the reporter or target and retaliation are mixed. Two
studies, one with military personnel, the other with federal employees, showed no direct
relationship between the reporter's status within the organization and retaliation (Bergman et
al., 2002; Near & Miceli, 1986). However, a study of federal court employees demonstrated
that victims with lower occupational status were more likely to receive both personal and work-
related retaliation than higher status victims; when a lower status victim reported, confronted,
or sought social support regarding a higher status harasser, the victim experienced more
retaliation of both types (Cortina & Magley, in press). Holding power within the organization,
as measured by support by management and immediate supervisor, has been consistently
associated with receiving less retaliation in the whistle-blowing literature (Miceli, Rehg, Near,
& Ryan, 1999; Near & Miceli, 1986). Thus it seems important to consider the relationship of
organizational power or status to retaliation.
OUTCOMES OF RETALIATION
Work-Related Outcomes
Given that harassment is associated with negative outcomes to the individual it is likely
that retaliation will also be negatively related to an individual's work attitudes, performance,
and commitment to the organization. In a study of civilian women involved in a class action
lawsuit against a private sector firm, outcomes associated with retaliation (after controlling for
the harassment and effects due to reporting) included decreased satisfaction with coworkers and
supervisors and increased work withdrawal (Fitzgerald et al., in preparation). For public sector
employees, as retaliation increased so did the targets' job dissatisfaction, job stress, and
organizational withdrawal (Cortina & Magley, in press). In a study of federal employees
retaliation resulted in increased involuntary exit from the organization including forced transfer
or leaving (Near & Miceli, 1986). For male and female military personnel, those who
experienced more retaliation generally had poorer work outcomes (Magley & Cortina, 2002),
and for those who reported harassment, retaliation was associated with lower procedural
satisfaction with reporting (Bergman et al., 2002).
Well-Being
In a study of civilian women involved in a class action lawsuit against a large private
firm, outcomes associated with retaliation included decreased health satisfaction and increased
psychological distress (Fitzgerald et al., in preparation). For public sector employees, as
retaliation increased the targets' psychological and physical health decreased (Cortina &
Magley, in press) and military personnel who experienced more retaliation demonstrated
generally poorer psychological and health-related outcomes (Magley & Cortina, 2002).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
224

A MODEL OF RETALIATION

Figure 1. Conceptual model of retaliation. Dashed lines represent paths from climate, leadership,
power, unprofessional, gender-related behavior, and retaliation to supervisor, coworker, work,
and military satisfaction. RET = retaliation. UGRB = unprofessional, gender-related behavior.
CLIM = climate. LEAD = leadership. POW = organizational power. CON = masculinized work
context. APP = appraisal. REP = reporting. COPE = coping. PSY = psychological well-being.
WB = physical well-being. JOB SAT = coworker, supervisor, and work satisfaction. MIL SAT =
military satisfaction. OC = organizational commitment.

We place retaliation within the context of UGRB and utilize the general model for sexual
harassment as a framework within which to consider the antecedents and consequences of
retaliation. Retaliation is defined as the frequency with which a target perceives either personal
or professional reprisals following an active or indirect response to an incident of unprofessional,
gender-related behavior. Consistent with the general model, the whistle-blowing literature, and
research on retaliation, we conceptualize retaliation (and the original UGRB) as arising from an
organizational climate that tolerates wrong-doing and negative interpersonal behaviors and
leadership that do not make reasonable efforts to stop harassment. A masculinized work context
is thought to lead to more frequent UGRB (Fitzgerald, Drasgow et al., 1997) and more frequent
UGRB to more frequent retaliation. Targets with low organizational status are expected to
receive more UGRB and retaliation. We draw on Lazarus and Folkman's (1984) cognitive stress
framework as applied to sexual harassment (Fitzgerald, Swan et al., 1995; Fitzgerald, Swan et al.,
1997) and suggest that the individual will appraise the "stressfulness" of the UGRB incident prior
to responding and that this personal appraisal will influence the nature of the subsequent responses
which will in turn be associated with retaliation. In theory, primary and secondary (coping response
and reporting) appraisal and are thought to have additional antecedents. In our model we limited
the antecedents of the appraisal process so that we could test the major propositions of retaliation
and keep the path model from becoming overly complex. Retaliation is thought to lead to negative
outcomes beyond those associated with UGRB, climate indicators, and organizational power.
Drawing from the organizational literature and the general model, the multiple aspects of job
satisfaction and satisfaction with the military will positively influence organizational commitment
and well-being. Figure 1 expresses these conceptual relationships.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
225

METHOD
Participants and Procedure
The data for this study were taken from the The 2002 Status of the Armed Forces
Surveys - Workplace and Gender Relations (Form 2002GB). This survey used a non-
proportional stratified, single stage random sample of 60,415 active-duty military personnel,
who were below the rank of admiral or general, with at least 6 months of active duty service.
Respondents were given the choice of either returning a paper and pencil questionnaire by mail,
or completing the same questionnaire on the Web. Both women and ethnic minorities were
oversampled relative to the overall military population. The target population consisted of
56,521 eligible members; out of those, 19,960 eligible respondents (10,235 men and 9,725
men) returned usable surveys, for an adjusted weighted response rate of 36%. The current study
utilizes a subsample of 5,795 service members (1,408 men and 4,387 women).
On average, the women were 28.73 years old (range = 17 - 60, SD =7.98). Sixty-six
percent of the women self-identified as White, 22% as Black/African American, 11% as
Hispanic/Latino, 5% as Asian, 3% as American Indian/Alaska Native, and less than 1 % as
Native Hawaiian/Pacific Islander.4 Nearly half of the female respondents were married (45%),
38% had never married, and 17% were separated, divorced, or widowed. Eighteen percent of
the women held a GED or high school diploma, 43% had attended some college, 6% had a
degree from a 2-year college, 15% received a degree from a 4-year college, 5% had attended
some graduate school, and 13% had obtained a graduate or other professional degree. Twenty
nine percent of the female respondents were in the Army, 24% reported being a service member
in the Air Force, 22% in the Navy, 15% in the Marine Corps, and 10% in the Coast Guard.5
On average, the men were 30.43 years old (range = 18-55, SD = 8.19). Seventy percent
of the men self-identified as White, 15% as Black/African American, 14% as Hispanic/Latino,
5% as Asian, 4% as American Indian/Alaska Native, and 1% as Native Hawaiian/Pacific
Islander. More than two-thirds (65%) of the male respondents were married, 29% had never
married, and 6% were separated, divorced, or widowed. Twenty two percent of the men held a
GED or high school diploma, 40% had attended some college, 5% had a degree from a 2-year
college, 14% received a degree from a 4-year college, 5% had attended some graduate school,
and 14% had obtained a graduate or other professional degree. Thirty one percent of the male
respondents were in the Air Force, 23% reported being a service member in the Navy, 22% in
the Army, 14% in the Marine Corps, and 10% in the Coast Guard
4 Respondents were able to endorse more than one race/ethnicity category.
5 Sample demographics are not significantly different from the full sample of eligible respondents.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
226

Instrumentation
Unprofessional, Gender-Related Behaviors
The Sexual Experiences Questionnaire - DoD - Shortened Version (SEQ-DoD-S; Stark,
Chernyshenko, Lancaster, Drasgow, & Fitzgerald, 2002) consists of 16 behavioral items that
assess respondents' unwanted sex-related experiences that occurred during the last 12 months
involving military personnel, or civilian employees/contractors in the military workplace. The
SEQ-DoD-S assesses four general categories of unprofessional, gender-related behaviors. Sexist
Behavior (4 items) includes gender-based discriminatory behaviors such as offensive sexist
remarks and differential, negative treatment based on gender. Crude/Offensive Behavior (4 items)
is more explicitly sexual in nature and includes behaviors such as repeatedly telling sexual
stories or jokes and making crude sexual remarks. Unwanted Sexual Attention (4 items) includes
unwanted sexual behaviors such as repeated requests for dates and touching or stroking. Sexual
Coercion (4 items) is defined as implicit or explicit demands for sexual favors through the threat
of negative job-related consequences or the promise of job-related benefits or bribes.6 Responses
were provided on a 5-point Likert-type scale ranging from 0 (never) to 4 (very often). Higher
scores indicated more frequent UGRB or more types of UGRB. The 16-item SEQ-DoD-S is
highly reliable, (coefficient alphas for all scales can be seen in Table 1) and considerable validity
information is available.
Retaliation
Respondents were asked to indicate whether or not, as a result of any unprofessional,
gender-related behavior, or response to that behavior (e.g., reporting, confronting, avoiding),
they had experienced any of 11 types of retaliatory behaviors. Three of these behaviors are
classified as personal retaliation (e.g., gossiped about you in an unkind or negative way) and
eight as professional (e.g., given an unfair performance evaluation). Responses were arranged
along a 3-point response scale and were recoded such that 1 = "no", 2 = "don't know", and 3 =
"yes," based on research indicating that a "don't know" option tends to act as a midpoint
(Drasgow, Fitzgerald, Magley, Waldo, & Zickar, 1999). Higher scores reflected greater amounts
of retaliation. Although the scale contains two types of retaliation, confirmatory factor analysis
indicated that the personal and professional factors were highly correlated; thus the scale is
considered to be unidimensional (Ormerod et al., in preparation).
Reporting
Five items assessed whether and to whom (e.g., supervisor; office designed to handle
such complaints) the respondent reported the unprofessional, gender-related behavior. The items
were scored dichotomously, with higher scores indicating that the person reported such behavior
through one or more channels.
Coping Responses to Harassment
This scale asks respondents to indicate the extent to which they engaged in specific non-
reporting coping strategies in response to unprofessional, gender-related behavior. The 17 items
comprise four individual scales (cognitive avoidance, confrontation, social support, behavioral
avoidance). However, for current purposes, all items were combined onto one response scale to
represent the frequency of the targets' non-reporting responses to UGRB. Responses were provided
on a 5-point Likert-type scale, ranging from 0 (not at all) to 4 (very large extent). Higher scores
indicated that the respondent engaged in more frequent coping responses to the UGRB.
6
Two additional items asking about sexual assault and an item asking about "other unwanted gender-related behavior"
were not utilized in these analyses.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
227

Subjective Appraisal
The subjective appraisal scale contains six items that ask respondents to rate the degree to
which a critical incident involving unprofessional, gender-related behavior was distressing (e.g.,
"offensive," "threatening," or "embarrassing"). Responses were provided on a 5-point Likert-type
scale, ranging from 0 (not at all) to 4 (extremely), with higher scores reflecting personal appraisals
of greater distress.
Organizational Climate
Climate was assessed by adapting the Organizational Tolerance of Sexual Harassment
Scale (OTSH; Hulin, Fitzgerald, & Drasgow, 1996) to a military context. Respondents were
presented with three hypothetical scenarios of different types of UGRB, and asked to indicate the
degree to which they agreed with statements about the climate for UGRB within workgroups or
broader organizational units. The climate scale assesses individual perceptions of organizational
tolerance for UGRB along scenarios about Crude and Offensive Behavior, Unwanted Sexual
Attention, and Sexual Coercion. Response options ask if a complaint was made by the respondent,
whether the respondent would incur risk, be taken seriously, or if corrective action would be taken.
Responses to these nine items were provided on a 5-point Likert-type scale, ranging from 1
(strongly disagree) to 5 (strongly agree). Higher scores reflected a work climate that is more
tolerant of UGRB.
Masculinized Work Context
A masculinized work context (i.e., the degree to which the gender of the workgroup and
the respondents' jobs are traditionally masculine) was assessed with four items. Included were the
gender of immediate supervisor (male or female), the gender ratio of coworkers (response scale
was recoded to range from 1 = all women to 7 = all men), and two dichotomously scored questions
asking whether their jobs were typically held by a person of their gender and whether members of
their gender were common in their work environment. These four items were standardized and
summed to create a single variable with high scores representing the degree to which the
respondent's work context was masculine.
Target's Organizational Power
Two items assessed the organizational power of the respondent. Respondents' pay grade
(i.e., military pay classifications recoded to range from 1 to 20) and the number of years of
completed active-duty service were standardized and summed to create a scale where higher scores
reflect holding a greater amount of organizational power.
Leadership efforts to stop sexual harassment
This 3-item scale assessed respondents' beliefs regarding whether senior leadership "made
honest and reasonable efforts to stop sexual harassment." Responses were provided on a 3-point
response scale and were recoded such that I = "no", 2 = "don't know", and 3 = "yes." A higher
score indicated a higher perception that senior leadership made "honest and reasonable efforts to
stop sexual harassment.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
228

Job satisfaction
Three indices of job satisfaction were assessed: supervisor satisfaction (9 items; e.g.,
"Leaders ... treat service members with respect"), coworker satisfaction (6 items; e.g., "You like
your coworkers"), and work satisfaction (6 items; e.g., "You like the kind of work you do").
Responses were provided on a 5-point Likert-type scale, ranging from I (strongly disagree) to 5
(strongly agree). Higher scores reflected more satisfying experiences with leaders, coworkers,
and work, respectively.
Military life satisfaction
The military life satisfaction scale consisted of seven items asking respondents to rate
their degree of satisfaction with their life and work in the military (e.g., "Quality of your
current residence," "Quality of your work environment," "Opportunities for professional
development"). Responses were provided on a 5-point Likert-type scale, ranging from 1 (very
dissatisfied) to 5 (very satisfied). Items were summed so that higher scores reflected greater
satisfaction with various aspects of military life.
Commitment
The organizational commitment scale consists of three items that assessed the
respondents' commitment to their service. Responses were provided on a 5-point Likert-type
scale, ranging from 1 (strongly disagree) to 5 (strongly agree). Higher scores reflected a higher
degree of commitment.
Psychological outcomes
Two indices of psychological well-being were assessed: emotional effects (3 items; e.g.,
"Didn't do work or other activities as carefully as usual") and psychological distress (5 items;
e.g., "Felt downhearted and blue"). Responses for both scales were provided on a 4-point
Likert-type scale, ranging from 1 (little or none of the time) to 4 (all or most of the time).
Items from both scales were recoded and summed into a composite variable, with higher scores
reflecting greater psychological well-being. Health outcomes
Two indices of health satisfaction were assessed: general health (4 items; e.g., "My
health is excellent") and health effects (4 items; e.g., "Accomplished less than you would like").
Responses for both scales, respectively, were provided on 4-point Likert-type scales, ranging
from 1 (definitely false, little or none of the time) to 4 (definitely true, all or most of the time).
Items from both scales were recoded and summed into a composite variable, with higher scores
reflecting greater physical well-being.
Analysis Plan
We conducted path analysis separately for men and women to test the proposed model
of retaliation following unprofessional, gender-related behavior (UGRB) shown in Figure 1.
This approach was utilized instead of structural equation modeling because certain constructs
(e.g., organizational power, leadership, masculinized work context, reporting) only had single
indicators available. Each sample (male and female) was randomly split into half so that the
model could be tested, with potential modifications, in the first half-sample and confirmed in
the second half-sample. Analyses were conducted using LISREL 8.30 and PRELIS 2.30
(Jöreskog & Sörbom, 1999) software. The path analysis utilized product moment correlation
matrices and maximum likelihood estimation. The following fit statistics from LISREL were
used to evaluate whether the specified model adequately fit the data, root mean square error of
approximation (RMSEA), non-normed fit index (NNFI), standardized root mean square
residual (SRMR), goodness-of-fit index (GFI), and adjusted goodness-of-fit index (AGFI).
The residual components of the job satisfaction variables were allowed to covary because
previous modeling with military samples suggested that the basic integrated model of
sexual harassment does not include all relevant antecedents of job satisfaction (Fitzgerald,
Drasgow, & Magley, 1999).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
229

RESULTS Exploratory Model


The range, mean, standard deviation, and coefficient alpha for variables by gender are
presented in Table 1.7 All descriptive analyses were performed using SPSS 11.5.0.

Table 1. Descriptive and Psychometric Information for Variables Included in the Path Analysis Model
Women (n = 4,387) Men (n = 1,408)
Scale Range M ab SD ab α Range Mab SDab α
Exogenous Variables
Organizational Climate 9-45 .90 9-45 .90
Masculinized -6.76-6.20 -.004 2.78 .64 -15.7-2.15 .005 2.34 .37
Work Context c
Organizational Powerc -2.33-9.53 .001 1.67 .56 -2.50-8.75 .001 1.68 .58
Leadership Practices 3-9 7.56 1.73 .77 3-9 7.94 1.60 .81
Endogenous Variables
UGRB 1-64 .90 1-64 .86
Appraise 0-24 .86 0-24 .84
Coping Response 0-68 .82 0-68 .82
Reportingd 0-5 .73 0-5 .75
Retaliation 11-33 .88 11-31 .89
Supervisor Satisfaction 9-45 28.35 8.01 .89 9-45 28.94 7.60 .88
Coworker Satisfaction 6-30 20.49 5.15 .91 6-30 20.91 4.83 .90
Work Satisfaction 6-30 20.41 6.09 .91 6-30 20.59 5.99 .90
Military Satisfaction 7-35 22.83 5.55 .79 7-35 22.62 5.92 .82
Psychological 8-32 26.20 4.78 .89 8-32 26.39 4.71 .88
Well-Being
Physical Well-Being 9-32 27.81 3.99 .84 8-32 28.01 3.83 .83
Organizational 3-15 11.81 2.45 .84 3-15 11.97 2.43 .82
Commitment
Notes. a The data are not yet released to the public, therefore we were unable to report certain statistics
such as the mean, standard deviation, and the number of individuals who experienced unprofessional,
gender-related behavior. b The means and standard deviations are based on unweighted data and caution
is urged in their interpretation. c The low reliability of these scales is likely due to the small number of
items. d Respondents may have reported one time only. UGRB = unprofessional, gender-related
behavior.

7The data are not yet released to the public therefore we were unable to report certain statistics such
as the means and the number of individuals who experienced unprofessional, gender-related behavior
or retaliation. Intercorrelations among the scales are available from the first author.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
230

Investigating the women's data first, the proposed model (see Figure 1) was examined
via path analysis in an exploratory random 50% sample (n = 1659, after listwise deletion). This
model of retaliation following UGRB utilizes the integrated model of sexual harassment with
military personnel (Fitzgerald et al., 1999; Fitzgerald, Hulin et al., 1995) as a theoretical starting
point and incorporates variables consistent with the whistle-blowing literature, research about
harassment and retaliation, and appraisal of stressful events and harassment. The fit statistics for
this initial model were acceptable (x2/df ratio = 445.88/62 = 7.19, RMSEA =.06, NNFI =.90,
SRMR =.05, GFI = .97, AGFI = .93) suggesting that the model fit well. Inspection of the β and
Γ matrices, standardized residuals, and modification indices suggested three minor and
theoretically justified revisions. The path from organizational power to retaliation was non-
significant and was dropped and the residual components of coping response and appraisal were
allowed to covary as were the residual components of coping response and reporting. This is
logical because appraisal is considered to be a process rather than a static event and appraisal
about the stressfulness of UGRB would likely undergo reevaluation following any response to
the behavior. For example, the target may originally appraise the UGRB as annoying and ask the
perpetrator to stop, if this is unsuccessful and it continues or increases, the target may then
appraise the event as more stressful or threatening and elect to report the behavior. Coping
response is also considered to be an ongoing process that can involve more than one type of
response over time. Reporting, a type of response, is measured separately from other responses
but is thought to be related. This modification improved fit (x2/df ratio = 245.70/61 = 4.03,
RMSEA = .04, NNFI = .95, SRMR = .03, GFI = .98, AGFI = .96) and did not affect the
estimated elements of the β and Γ matrices.
Next, the proposed model was examined for the men via path analysis in the men's
exploratory random 50% sample (n = 545, after listwise deletion). The fit statistics for this
initial model were acceptable (x2/df ratio = 227.53/62 = 3.67, RMSEA = .07, NNFI = .87,
SRMR = .07, GFI = .95, AGE = .89) suggesting that the model fit reasonably well. We again
allowed the residual components of response and appraisal and the residual components of
response and reporting to covary and dropped the nonsignificant path from organizational
power to retaliation. Following these modifications, the proposed model was examined and fit
statistics were found to be acceptable (x2/df ratio = 154.22/61 = 2.53, RMSEA = .05, NNFI =
.93, SRMR = .05, GFI = .97, AGFI = .92).
Cross-Validation Model
For the women, the model was cross-validated on the remaining 50% sample (n = 1702,
after listwise deletion), and the fit was excellent (x/df ratio = 203.41/61 = 3.33 RMSEA = .04,
NNFI = .96, SRMR = .03, GFI = .99, AGFI = .97). The same process took place for the men
and the model was cross-validated on the remaining 50% sample (n = 568, after listwise
deletion). The fit was again excellent (x/df ratio = 140.73/61 = 2.31 RMSEA =.05, NNFI =.93,
SRMR =.04, GFI =.97, AGFI =.93).
Model Summary
Path coefficients can be seen in Tables 2, 3, 4, and 5 for both women and men. The paths
suggest that for both sexes more frequent retaliation is predicted by an organizational climate that
tolerates sexual harassment and, conversely, when leadership makes reasonable efforts to stop
harassment, retaliation is less frequent (see Table 2).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
231

More frequent occurrences of UGRB, and more frequent use of coping responses such as cognitive
and behavioral avoidance, confronting the perpetrator, and seeking social support are associated
with increases in retaliation (see Table 3). For women, reporting is linked to more retaliation (see
Table 3).

Table 2. Paths from Antecedent Variables to Retaliation and UGRB for Women and Men
Antecedent
Outcome Climate Leader Power UGRB Context
Retaliation .21 -.07 ns .22 -
.22 -.13 ns .10
UGRB .30 -.21 -.17 - .15
.20 -.23 ns ns
Note. The first entry in each cell is for the women's cross-validation sample; the second
entry in each cell is from the men's sample. UGRB = unprofessional, gender-related
behavior. ns = not significant.

Table 3. Paths from Primary and Secondary Appraisal to Retaliation for Women and
Men
Antecedent
Outcome Appraisal Coping Reporting UGRB
Retaliation - .13 .16 .22
.21 ns .10
Appraisal - .93 .41 .53
.87 .24 .46
Note. The first entry in each cell is for the women's cross-validation sample; the second
entry in each cell is from the men's sample. UGRB = unprofessional, gender-related
behavior. ns = not significant.

Retaliation is associated with lowered levels of coworker satisfaction and


psychological and physical well-being for both men and women (see Table 4). For women,
retaliation is related to lower levels of supervisor and work satisfaction and satisfaction with
the military. However, several of these paths are small and should be viewed cautiously (see
Table 4).
As expected for both men and women, more frequent experiences of UGRB were
associated with an organizational climate that is tolerant of behaviors indicative of sexual
harassment (see Table 2). When personnel perceive that leadership makes efforts to stop
harassment they also report less frequent UGRB. For women, working in a masculinized work
context and holding lower organizational power are related to higher scores on the SEQ-DoD (see
Table 2). More frequent experiences of UGRB were associated with appraising such experiences as
more distressing or threatening for male and female personnel which, in turn, were related to
reporting and other types of coping response (see Table 3). Perceptions that one's organization is
tolerant of harassment was related to decrements in job satisfaction and satisfaction with the
military for men and women (see Table 4). Conversely, perceiving one's leadership as making
reasonable efforts to stop harassment and holding greater organizational power were related to
increased satisfaction for personnel. More frequent experiences of UGRB were related to
decrements in coworker satisfaction, satisfaction with the military, and psychological well-being
for personnel. Additionally, UGRB was related to decreased supervisor and work satisfaction for
the men, although paths were small and should be interpreted with caution. Unexpectedly, the path

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
232

from UGRB to work satisfaction was positive (see Table 4), albeit small (.06; t = 2.13), in the
women's cross-validation sample. No explanation is offered because this path was negative in the
derivation sample. Finally, strong paths were observed between psychological and physical well-
being (women = .34, men = .27) and organizational commitment was associated with satisfaction
with coworkers, work, and the military for both men and women. For women, satisfaction with
supervisor predicted organizational commitment (see Table 5).

Table 4. Paths from Antecedent Variables to Outcomes for Women and Men
Antecedent
Outcome Climate Leader Power UGRB Retaliation
Supervisor -.34 .25 .12 ns -.05
Satisfaction -.31 .19 .20 -.10 ns
Coworker -.22 .14 .15 -.11 -.06
Satisfaction -.22 .10 .15 -.08 -.14
Work -.20 .16 .15 .06 -.05
Satisfaction -.19 .17 .11 -.09 ns
Military -.27 .13 .16 -.07 -.10
Satisfaction -.26 .11 .15 -.14 ns
Psychological - - - -.11 -.09
Well-Being -.10 -.14
Physical - - - ns -.05
Health ns -.18
Note. The first entry in each cell is for the women's cross-validation sample; the second entry
in each cell is from the men's sample. UGRB= unprofessional, gender-related behavior. ns =
not significant.

Table S. Paths from Satisfaction to Organizational Commitment and Psychological Well-


Being for Women and Men
Antecedent
Supervisor Coworker Work Military
Outcome Satisfaction Satisfaction Satisfaction Satisfaction
Organizationa .08 .07 .23 .23
Commitment ns .13 .33 .22
Psychological ns .09 .15 .19
Well-Being ns .11 .14 .16
Note. The first entry in each cell is for the women's cross-validation sample; the second
entry in each cell is from the men's sample. ns = not significant.

In sum, military personnel reported more retaliation when they (1) worked in a climate
where UGRG was believed likely to occur, (2) endorsed more unprofessional, gender-related
behaviors, and (3) experienced these behaviors as more threatening or severe and responded by
seeking social support, confronting or avoiding the perpetrator, or attempted to cope by managing
their cognitive and emotional reactions to the behavior. Female personnel endorsed more
retaliation when they reported UGRB to their supervisors, leadership, or organization.
Conversely, retaliation was less when personnel perceived that leaders made efforts to stop
harassment. Retaliation was directly and inversely related to (1) coworker satisfaction, (2)
psychological well-being, and (3) physical well-being for male and female personnel. Decrements
in elements of job satisfaction and satisfaction with the military were in turn related to lowered

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
233

organizational commitment and psychological well-being impacted physical well-being. For


women, retaliation was associated with lowered satisfaction with supervisors, work, and the
military.
DISCUSSION
Our findings suggest that retaliation is associated with damages to male and female
personnel, including their psychological and physical well-being and satisfaction with
coworkers. It is likely to occur when UGRB is severe and when the organization is tolerant of
such behavior. Leadership efforts to stop harassment exerted a negative effect on retaliation,
suggesting that such efforts also contribute to curbing retaliation. The appraisal process appears
to play a critical role in determining retaliation. The relationships among UGRB, appraisal,
response, reporting, and retaliation were as expected with the exception of reporting for men.
When UGRB is appraised as more distressing or threatening, personnel are likely to engage in
increased responding (and for women, reporting), which is related to increased retaliation.
Although the determinants of UGRB (masculinized work context, holding organizational
power) may differ for men and women, those of retaliation do not. It was surprising that
organizational power was not related to retaliation for men or women. However, this is
consistent with Near and Miceli (1986) who found that the power of the whistle-blower was
unrelated to retaliation. It is possible that the status of the target is irrelevant in face of the
potential threat or damage to the organization (Near & Miceli, 1986) from a charge of
harassment, particularly given the high degree of media attention that follows charges of sexual
harassment in the military. Alternatively, those at the very highest levels of power (e.g.,
Admiral, General) were not surveyed; therefore it is impossible to say whether those who hold
the highest positions experience less retaliation. It is possible that status of the target may
function significantly only relative to other variables such as the status of the perpetrator.
It was unexpected that reporting had no relationship to retaliation for men. It is possible
that men report less often, preferring to institute other types of responses. That the composite
measure of coping response had such a robust relationship with appraisal and retaliation bears
further investigation. The direct and mediating effects of the appraisal process on retaliation
have until now been unstudied. In future studies it will be important to understand the
contributions of each type of response to retaliation (e.g., confronting, seeking support,
cognitive and behavioral avoidance).
Retaliation exerted less of an effect on job satisfaction and satisfaction with the military
than was expected. This may have to do with the numerous other predictor variables included
in the model and bears closer investigation. At the same time, it is important to look at the
whole picture and observe that, taken together, job satisfaction (and indirectly organizational
commitment) was strongly influenced by multiple forms of negative workplace behavior
(retaliation, UGRB) and climate related to harassment (organizational tolerance). Although we
did not examine costs to the military, such behaviors likely have significant organizational
costs, given that they affect the commitment and well-being of personnel. Related to costs, it
will be important to investigate turnover intentions and actual turnover in future studies. That
increases in retaliation were associated with decreases in satisfaction with the military for
female personnel is notable. A next step for future research is to examine whether links exist
between retaliation, military satisfaction, and exit strategies.
That the two measures of organizational climate (leadership and climate) were such
important correlates for retaliation (and UGRB and outcomes) supports both theory and
research in the whistle-blowing and sexual harassment literatures. Climate is consistently
associated with negative workplace experiences and negative outcomes directly and indirectly
through the negative experiences (Fitzgerald, Drasgow et al., 1997; Glomb et al., 1997). In our
study, tolerant climate was strongly related to increased experiences of retaliation, and when
leadership implemented efforts to reduce sexual harassment retaliation decreased. Of course
these findings are correlational and should not be interpreted causally. They are important

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
234

because active efforts from leadership to implement policy to stop harassment (e.g., punish
those who harass) has been identified as the single most effective strategy from leadership
(among those studied thus far) that relates to reduced rates of harassment in the military
(Hunter Williams, Fitzgerald, & Drasgow, 1999). Our findings support that it is important to
promote an organizational climate that does not tolerate wrong-doing.
Our general framework for understanding retaliation following from UGRB appears to
be supported. An additional variable to consider in future research is the fear of retaliation.
Magley and Cortina (2002) found that fear of retaliation was related to harassment, power of
the perpetrator, leadership behaviors, and that even in the absence of reporting or other active
coping response, fear of retaliation was associated with negative outcomes.
Although this is a strong first attempt to understand the process of retaliation following
UGRB, it raises several issues. One issue is whether personal and professional retaliation have
the same antecedents and outcomes. Given that our measure included a majority of items about
professional retaliation, the model may be more reflective of professional retaliation. There is
far less research about personal retaliation and in civilian contexts it is not a legally actionable
offense. However, what little research exists, suggests that interpersonal retaliation is strongly
associated with damages that can lead to costs to the organization (Cortina & Magley, in press;
Fitzgerald et al., in preparation), thus it seems important to include both types of retaliation in
future research.
This study is not without limitations. Our data are cross-sectional and the analytic
methods correlational and therefore no assumptions about causality can be made. Common
method variance may also explain some of the significant relationships among variables given
that data were single-source and self-report. However, unprofessional, gender-related
experiences and retaliation were asked after the outcome variables in an attempt to minimize
method variance due to self-report.
In conclusion, findings from this study suggest that military personnel who experience
retaliation are also likely to experience decrements in job satisfaction and decreased well-being.
Supporting our framework, retaliation was associated with a climate that tolerates harassment,
unprofessional, gender-related experiences, and the appraisal process. This study supports the
contention that organizational climate is of paramount importance for reducing negative
workplace experiences and that unchecked, negative reprisals will be associated with outcomes
that can be costly for individuals and organizations. That leadership efforts to reduce harassment
are associated with reduced retaliation suggests that it would be effective to continue to
implement policy and procedure that inhibits unprofessional, gender-related behavior.
REFERENCES
Bergman, M. E., Langhout, R. D., Palmieri, P. A., Cortina, L. M., & Fitzgerald, L. F. (2002).
The (Un)reasonableness of reporting: Antecedents and consequences of reporting
sexual harassment. Journal of Applied Psychology, 87, 230-242.
Coles, F. S. (1986). Forced to quit: Sexual harassment complaints and agency response. Sex
Roles, 14, 81-95.
Cortina, L. M., & Magley, V. J. (in press). Raising voice, risking retaliation: Events following
interpersonal mistreatment in the workplace. Journal of Occupational Health
Psychology.
Crockett, R. W., Gilmere, J. A. (1999). Retaliation: Agency theory and gaps in the law. Public
Personnel Management, 28, 39-49.
Drasgow, F. (1999). Preface to the special issue. Military Psychology, 11, 217-218.
Drasgow, F., Fitzgerald. L.F., Magley, V.J., Waldo, C.R., & Zickar, M.J. (1999). The 1995
Armed forces sexual harassment survey: Report on scales and measures (DMDC
Report No. 98-004). Arlington, VA: Defense Manpower Data Center.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
235

Fitzgerald, L.F., Drasgow, F., Hulin, C.L., Gelfand, M.J. & Magley, V.J. (1997). The
antecedents and consequences of sexual harassment in organizations: A test of an
integrated model. Journal of Applied Psychology, 82, 578-589.
Fitzgerald, L.F., Drasgow, F., & Magley, V.J. (1999). Sexual harassment in the Armed Forces:
A test of an integrated model. Military Psychology, 11, 329-343.
Fitzgerald, L. F., Hulin, C. L., & Drasgow, F. (1995). The antecedents and consequences of
sexual harassment in organizations: An integrated model. In G.P. Keita & J.J. Hurrell,
Jr. (Eds.). Job stress in a changing workforce: Investigating gender, diversity, and
family issues (pp. 55-74). Washington, DC: American Psychological Association.
Fitzgerald, L. F., Smolen, A. C., Harped, M., S., Collinsworth, L. L., Colbert, C. L. (in
preparation). Sexual harassment. Impact of reporting and retaliation. University of
Illinois at Urbana-Champaign.
Fitzgerald, L. F., Swan, S., & Fischer, K. (1995). Why didn't she just report him? The
psychological and legal implications of women's responses to sexual harassment.
Journal of Social Issues, 51, 117-138.
Fitzgerald, L. F., Swan, S., & Magley, V. J. (1997). But was it really sexual harassment? Legal,
behavioral, and psychological definitions of the workplace victimization of women. In W.
O'Donohue (Ed.), Sexual harassment. Theory, research, and treatment (pp. 5-28). Boston:
Allyn and Bacon.
Glomb, T. M., Richman, W. L., Hulin, C. L., Drasgow, F., Schneider, K. T., & Fitzgerald,
L. F. (1997). Ambient sexual harassment: An integrated model of antecedents and
consequences. Organizational Behavior and Human Decision Processes, 71, 309-
328.
Hesson-McInnis, M.S. & Fitzgerald, L.F. (1997). Sexual harassment: A preliminary test of
an integrative model. Journal of Applied Social Psychology, 27, 877-901.
Hulin, C. L., Fitzgerald, L. F., & Drasgow, F. (1996). Organizational influences on sexual
harassment. In M. Stockdale (Ed.), Sexual harassment in the workplace, Vol. 5, (pp.
127-150). Thousand Oaks, CA: Sage.
Hunter Williams, J., Fitzgerald, L.F., & Drasgow, F. (1999). The effects of organizational
practices on sexual harassment and individual outcomes in the military. Military
Psychology, 11, 303-328.
Jöreskog, K., & Sörbom, D. (1999). LISREL 8.30 and PRELIS 2.30. Scientific Software
International, Inc.
Lazarus, R.S. & Folkman, S. (1984). Stress, appraisal and coping. New York: Springer.
Loy, P. H., & Stewart, L. P. (1984). The extent and effects of sexual harassment of working
women. Sociological Focus, 17, 31-43.
Magley, V. M., & Cortina, L. M. (2002, April). Retaliation against military personnel who
blow the whistle on sexual harassment. In V. J. Magley & L. M. Cortina (Co-
chairs), Intersections of workplace mistreatment, gender, and occupational health.
Symposium presented at the annual meeting of the Society for Industrial and
Organizational Psychology, Toronto, CN.
Miceli, M.P. & Near, J.P. (1988). Individual and situational correlates of whistle-
blowing. Personnel Psychology, 41, 267-281.
Miceli, M. P., & Near, J. P. (1992). Blowing the whistle: The organizational and legal
implications for companies and employees. NY, NY: Lexington Books.
Miceli, M. P., Rehg, M., Near, J. P., & Ryan, K. C. (1999). Can laws protect whistle-
blowers? Results of a naturally occurring field experiment. Work and
Occupations, 26, 129-151.
Near, J. P., & Miceli, M. P. (1986). Retaliation against whistle-blowers: Predictors and
effects. Journal of Applied Psychology, 71, 137-145.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
236

Parmelee, M. A., Near, J. P. & Jensen, T. C. (1982). Correlates of whistle-blowers'


perceptions of organizational retaliation. Administrative Science Quarterly, 27, 17-
34.
Ormerod A. J., Lawson, A. K., Sims, C. S., Lytell, M. C., Wadlington, P. L., Yaeger, D.
W., Wright, C. V., Reed, M. E., Lee, W. C., Drasgow, F., Fitzgerald, L. F.,
Cohorn, C. A. (In preparation). The 2002 Status of the Armed Forces Surveys -
Workplace and Gender Relations: Report on scales and measures. (DMDC Report
No.). Arlington, VA: Defense Manpower Data Center.
Pryor, J. B., Giedd, J. L, & Williams, K. B. (1995). A social psychological model for
predicting sexual harassment. Journal of Social Issues, 51, 69-84.
Pryor, J. B., & Whalen, N. J. (1997). A typology of sexual harassment: Characteristics of
harassers and the social circumstances under which sexual harassment occurs. In W.
O’Donohue (Ed.), Sexual harassment: Theory, research, and treatment. (pp. 129-
151). Boston: Allyn & Bacon.
SPSS for Windows 11.5.0 [Computer Software]. (2002). Chicago, IL: SPSS Inc.
Stark, S., Chernyshenko, O.S., Lancaster, A.R., Drasgow, F., Fitzgerald, L.F. (2002).
Toward standardized measurement of sexual harassment: Shortening the SEQ-DoD
using item response theory. Military Psychology, 14, 49-72.
Stockdale, M.S. (1998). The direct and moderating influences of sexual-harassment
pervasiveness, coping strategies, and gender on work-related outcomes.
Psychology of Women Quarterly, 22, 521-535.
Survey Method for Counting Incidents of Sexual Harassment (April 28, 2002). Washington,
DC: Office of the Under Secretary of Defense for Personnel and Readiness.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
237

UNDERSTANDING RESPONSES TO SEXUAL HARASSMENT IN THE


U.S. MILITARY*
Angela K. Lawson
Louise F. Fitzgerald
University of Illinois at Urbana-Champaign
603 East Daniel
Champaign, Illinois 61820
alawson@s.psych.uiuc.edu

INTRODUCTION
Research on sexual harassment prevalence confirms its presence in a wide variety of
organizational environments and finds it to be strongly associated with a number of negative
outcomes for both individuals and organizations (Fitzgerald, Drasgow, Hulin, Gelfand, &
Magley, 1997; Malamut & Offermann, 2001; McKinney, K. et al., 1998; U.S. Merit Systems
Protection Board, 1994). Work-place gender-ratio, gender stereotyping of jobs, and
organizational climate have all been linked to the prevalence of sexually harassing behaviors;
further, female employees in male-dominated work groups and/or organizations that appear to
tolerate sexually inappropriate behavior are more likely to be targets of harassment than are those
employed in more gender-balanced environments intolerant of sexual harassment (Fitzgerald et
al., 1997).
Whatever the context, employees rarely report such experiences to management
(Bergman et al., 2002; Marin & Guadagno, 1999). Marin and Guadagno (1999) suggest that such
non-reporting may be linked to non-labeling of the incident (as harassment), fear of retaliation,
or negative appraisals from supervisors and coworkers. Fitzgerald and Swan (1995) postulate
that reluctance to report may also arise from a belief that complaints would not be taken
seriously whereas Baker et al. (1990) implicate the organization’s perceived tolerance of
inappropriate behavior as an important influence. Additionally research suggests that the reasons
for not reporting can be grouped into two categories. The first being fear resulting from the
perceived risks inherent to the target’s occupational and personal well-being and the second
focusing more on issues associated with organizational policies and procedures associated with
reporting sexual harassment (Peirce, Rosen, & Hiller, 1997).

*
Paper presented at the 2003 IMTA Conference, Pensacola, Florida.

This research is funded by the Defense Manpower Data Center (DMDC), through the
Consortium of Universities of the Washington Metropolitan Area, Contract, M67004-03-C-0006
as well as the National Institute of Mental Health grant #MH50791-08. The opinions of this
paper are those of the authors and are not to be construed as official DMDC or Department of
Defense position unless so designated by other authorized documents.

Please do not cite or quote without permission. Correspondence should be addressed to Angela
K. Lawson, 603 E. Daniel, Champaign, IL 61820 or alawson@s.psych.uiuc.edu.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
238

Determinants of Reporting
Research on the determinants of reporting sexual harassment has typically focused on a
combination of individual, stimulus (i.e., severity-related), and/or organizational variables.
Exploration of the organizational determinants of reporting behavior has examined the climate,
or tolerance, for sexual harassment in the organization. Organizations that do not pro-actively
discourage this problem, do not take reports of harassment seriously, do not discourage
retaliation, or that have inadequate or non-existent harassment policies and investigative
procedures are less likely to be told about the problems their employees may be having
(Offermann & Malamut, 2002; Fitzgerald et al., 1997; Brooks & Perot, 1991, Perry et al., 1997,
Malamut & Offermann, 2001: Bergman, Langhout, Palmieri, Cortina, & Fitzgerald, 2002).
Stimulus antecedents include variables specific to the incident of the harassment, such as
frequency, intensity, and duration, (Rudman, et al., 1995; Malamut & Offermann, 2001,
Bergman et al., 2002), whereas individual variables focus on demographic data such as age, race,
etc. (Perry et al., 1997; Brooks and Perot, 1991; Knapp, Faley, Ekeberg, & DuBois, 1997,
Rudman, et al., 1995, Malamut & Offermann, 2001, Bergman et al., 2002). Although each of
these determinants has been shown to affect reporting behavior, individual variables are typically
less influential than organizational and situational variables in the decision to report (Fitzgerald
et al., 1995).
While work on the determinants of reporting sexual harassment provides readers with
invaluable information this work has operated under a dichotomous model of reporting sexual
harassment. This research ignores the findings of Malamut et al., (2002) which suggests that
targets of sexual harassment do not rigidly respond to sexual harassment in such a black and
white manner. Instead it appears that these targets employ reporting strategies far less rigidly and
choose to report only some of the harassment they have experienced rather than all or none of it.
Thus, combining individuals who report only some of the harassment experience(s) with either
non-reporters or those who report all the harassment may confound our understanding of
reporting behavior. Additionally, analysis of sexual harassment litigation suggests that details of
the harassing experiences gradually emerge during the litigation process. This emersion of
details can be used to discredit complainants by suggesting that the complainant is fabricating
their description of the experience. A better understanding of reporting behavior could be utilized
to lend credibility to targets during litigation.
Further, while it is interesting to know which individual, stimulus, and organizational
variables impact reporting behavior it is arguably equally if not more important to understand
target’s reasons for not reporting sexual harassment. An examination of this type of data can
provide researchers and organizations with valuable information that could assist them in
effectively encouraging reporting. Enactment of these types of changes, based on target’s
responses, protect individuals from physical and psychological harm and also protect an
organizations investment in their employees by possibly increasing job satisfaction, decreasing
attrition, etc.
METHOD
Participants
This study utilized data from the Status of the Armed Forces Survey: Workplace and
Gender Relations 2002 (WGR 2002) collected by the Defense Manpower Data Center. The

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
239

sample population was selected through a stratified random sampling procedure in an effort to
adequately represent all relevant subgroups including branch of service, gender, pay-grade, and
racial/ethnic group membership. The original sample consisted of 60,415 individuals including
3,894 undeliverable records. Survey response rate was based on only those surveys with a
minimum of 50% item completion and completion of at least one item on the Sexual Experiences
Questionnaire (SEQ-DoD). Survey distribution resulted in a 36% adjusted weighted response
rate.
Following survey completion, data from 5,886 (1,671 men and 4,215 women) of the total
sample population were classified as eligible participants for data analysis. Participant data was
deemed eligible for analysis in this study if, in addition to the eligibility requirements for
response rate calculation, the participant had answered at least one item on the Sexual
Experiences Questionnaire with a score of 1 or greater, at least one item on the One Situation
with the Greatest Effect (a situation specific Sexual Experiences Questionnaire) question with a
score of 1, and at least one item on the Reporting scale with a score of 1 or at least one item on
the Non-Reporting scale with a score of 1.
Participants were somewhat evenly divided across branches of service, with 27% in the
Army, 23% in the Navy, 14% the Marine, 26% the Air Force, and 10% in the Coast Guard. The
majority of participants were white women, with 72% women, 28% men, 12% Hispanic, 60%
non-Hispanic white, 18% non-Hispanic black or African American, and 10% other races/ethnic
backgrounds. Participant’s marital status was fairly evenly split between those that were married
or separated (54%) and those who never married, divorced or widowed (46%). Roughly half of
the participants (48%) completed at least two years of college but did not obtain a degree beyond
an Associates degree, 33% received a 4-year college degree or higher degree, and 19% received
their GED, high school diploma or participated in less than 12 years of school. The majority of
participants had completed less than 6 years of active duty service (47%), 14% had completed
years 6-9 years, 30% had completed 10-19 years, and 9% had completed 20 years or more.
Finally 70% of participants were enlisted personnel, 17% were warrant officers, and 26% were
commissioned officers. Participant age was not assessed in the survey.
Procedure
A 16-page survey booklet was mailed to Active Duty and Coast Guard members via a
primarily residential mailing list. Additionally a web site was created to provide service members
with the option of online survey completion. A notification of the upcoming survey was mailed
in December 2001 followed by the first wave of the survey mailing three weeks later. Two
weeks after the initial mailing service members were sent a thank you letter via direct mail
followed by a second wave of surveys mailed to individuals who had not yet returned the initial
survey. The third and final wave of survey mailings was sent four weeks after the second survey
mailing. The survey was closed on April 23, 2002.
Measures
Climate. Respondents were presented with three scenarios of harassment and asked to
assess the degree to which they thought that a report of this behavior would be taken seriously,
whether or not it would be risky, to complain, and whether or not they thought any action would
be taken as a result of the complaint. The three item variables (one item per scenario) of Serious,
Risk, and Action were then combined to form an overall nine-item Climate variable. These items
are modified from the Organizational Tolerance of Sexual Harassment scale (Hulin, 1993) and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
240

are intended to assess participant’s perceptions of climate in the military. A higher score on the
climate variable indicates a perception that the organization is tolerant of sexual harassment.
Frequency. Participants were asked to reflect on the One Situation with Greatest Effect to
answer questions related to the frequency, and other aspects specific to the individual’s
experience with the unwanted behavior(s). Frequency was determined by the participant’s
response to a single item that asked how often the offensive behavior occurred. Using a 5-point
response scale respondents were given the following options: “once”, “occasionally”,
“frequently”, “almost every day”, and “more than once a day”. A higher score indicates the
presence of more frequent harassment.
One Situation with the Greatest Effect. All 19 items from the Department of Defense
Sexual Experiences Questionnaire (DoD SEQ) were used to measure the frequency of unwanted
sex/gender related talk and/or behavior as it pertained to the one situation of sexual harassment
that the target perceived to have had the greatest effect on them5. The response scale for the
situation specific version of the SEQ was modified to contain a dichotomous scoring option of
“did this” or “did not do this” instead of a 5-point response scale ranging from “never” to “very
often” utilized in the DoD SEQ. Respondents were asked to indicate how often the unwanted
verbal and/or physical behavior(s) had occurred by using the dichotomous response scale. The
response options were divided into subgroups based on the types of behaviors indicated in each
item.
Sexist Behavior was identified by the endorsement of one or more of four response
options which referenced unwanted behaviors that included “referring to people of your gender
in insulting or offensive terms”, “treated you “differently” because of your gender”, “made
offensive sexist remarks”, and “put you down or was condescending to you because of your
gender”.
Unwanted Sexual Attention goes beyond experiencing verbal discourse relative to sexual
topics and includes behaviors such as unwanted touching and unreciprocated “attempts to
establish a romantic sexual relationship”. Respondents were identified as having experienced
Unwanted Sexual Attention if they endorsed at least one of six response options relevant to these
types of behavior. These response options include “made unwanted attempts to establish a
romantic sexual relationship with you despite your efforts to discourage it”, continued to asked
you for dates, drinks, dinner, etc., even though you said No”, “touched you in a way that made
you feel uncomfortable”, “made unwanted attempts to stroke, fondle or kiss you”, “attempted to
have sex with you without your consent or against your will, but was not successful”, and “had
sex with you without your consent or against your will”.
5
Survey measurement of sexual harassment is defined by the U.S. Department of Defense as the presence of
behaviors indicative of sexual harassment (Crude/Offensive Behavior, Sexual Coercion, and Unwanted Sexual
Attention; Sexist Behavior and Sexual Assault are not counted in the DoD survey measure
of sexual harassment) and the labeling of those behaviors as sexual harassment (Survey Method for Counting
Incidents of Sexual Harassment, 2002). The WGR 2002 did not include a labeling item specific to the “One
Situation with the Greatest Effect,” rather, labeling was tied to all incidents of behaviors indicative of harassment.
As such the use of the phrase “sexual harassment” or “harassment” in this document reflects only one of the two
qualifiers of the phenomenon. Further, use of these terms is in no way meant to reflect the legal definition of sexual
harassment. This wording is utilized for consistency and comparison with the literature on reporting sexual
harassment.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
241

Finally individuals were identified as having experienced Sexual Coercion if they


endorsed one of four response items including behaviors wherein the harasser bribed, threatened
or in some way coerced the target into participating in sexual activities or treated them badly for
not participating. These response options include “made you feel like you were being bribed with
some sort of reward or special treatment to engage in sexual behavior”, “made you feel
threatened with some sort of retaliation for not being sexually cooperative”, “treated you badly
for refusing to have sex”, and “implied faster promotions or better treatment if you were sexually
cooperative”.
Reasons for Not Reporting. Survey participants were asked to indicate whether or not any
of the 19 items listed in this checklist matched their reasons for not reporting those behavior(s)
indicative of sexual harassment they had experienced as a part of the One Situation with the
Greatest Effect.
Reporting Status. Research participants were separated into one of three groups:
Complete Reporters (those individuals who reported all of the harassment that had occurred as
part of the One Situation With the Greatest Effect), Non-Reporters (those individuals who did
not report any of the harassing behaviors indicated in the One Situation With the Greatest
Effect), and Partial Reporters (those individuals who reported only some of the behaviors that
had occurred as part of the One Situation With the Greatest Effect). Non-Reporters were
identified through the use a 5-item dichotomous (yes or no) reporting checklist which asked
participants to specify whether or not they had reported the unwanted sexual talk and/or
behaviors they had indicated had occurred on the Sexual Experiences Questionnaire as applied
specifically to the one situation (incident(s) of sexual harassment) which had the greatest effect
on the participant. Respondents who did not endorse reporting the One Situation with the
Greatest Effect to any of the five listed individuals/groups were placed in the Non-Reporters
group.
Participants were placed into the Complete Reporters group based on their response to
both the reporting checklist used to identify Non-Reporters and an additional survey item
relevant to the comprehensiveness of the participant’s formal or informal report. Respondents
were asked to indicate (yes or no) on the reporting checklist whether they had reported the
incident of sexual harassment which had the greatest effect on them to their immediate
supervisor, someone else in their chain-of-command (including their commanding officer),
Supervisor(s) of the person(s) who did it, Special military office responsible for handling these
kinds of complaints (for example, Military Equal Opportunity of Civil Rights Office), or Other
installation/Service/DoD person or office with responsibility for follow-up.
Participants who endorsed reporting the situation to any of the above individuals/groups
and indicated that they had reported all of the behaviors that had occurred as part of the One
Situation with the Greatest Effect were designated as Complete Reporters. Participants who
endorsed reporting the situation to any of the individuals/groups listed in the reporting checklist
but indicated that they had not reported all of the behaviors that had occurred as part of the One
Situation with the Greatest Effect were designated as Partial Reporters.
Supervisor Harassment. Bergman et al (2002) suggest that a harasser’s status in an
organization may influence a target’s willingness to report sexual harassment. Therefore
perpetrators were categorized on the basis of their power within the military context.
Respondents who endorsed one of five items indicating that a supervisor or someone else of a

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
242

higher rank than themselves perpetrated the harassment were identified as having experienced
Supervisor Harassment.
Subordinate Harassment. Subordinate Harassment was assessed by participant’s
responses to one of two questions regarding a subordinate’s perpetration of harassment. was
assessed by participant’s responses to one of two questions regarding a subordinate’s
perpetration of harassment.
Missing Data
Missing data was imputed utilizing a technique suggested by Bernaards, C. A. & Sijtsma,
K. (2000). The author’s recommend utilizing a two-way imputation of item mean and person
mean rather than relying on one or the other alone. "Two-way imputation (TW) calculates across
available scores the overall mean, the mean for item j and the mean for person i, and imputes IM
(item mean) + PM (person mean) –OM (overall mean) for missing observation (i,j)." (Bernaards,
C. A., et al., 2000, p. 331). Scale data was imputed based on total number of items in the scale. If
the scale contained 4-10 items only 1 item was utilized for imputation, if the scale contained 11-
20 items 2 items could be imputed and if the scale contained 21-30 items 3 items could be
imputed.
ANALYSIS
Following data collection and cleaning the dataset was randomly divided into a
developmental and a confirmatory sample (N=2951). The developmental sample was used to
examine variables that could differentiate between the group of individuals who reported none of
the harassment (Non-reporters, N=4471), some of the harassment (Partial Reporters, N=644), or
all of the harassment they had experienced (Complete Reporters, N=771). A host of variables
were included in the Multinomial Logistic Regressions performed separately for women and men
on the developmental sample. These variables included sex, level of education, race/ethnicity,
marital status, branch of service, paygrade, years of active-duty service, gender of supervisor,
gender mix of work group, perception of leadership, sexual behaviors, sexist behaviors,
unwanted sexual attention, sexual coercion, appraisal of harassment, where and when harassment
occurred, gender of the harasser(s), rank of the harasser(s), frequency and duration of
harassment, organizational tolerance, and sexual harassment training. Only those variables that,
when alone in the model, resulted in significant (≤.05) or marginally significant (≤ .15)
differentiation between the three groups were kept in the model for inclusion in testing using the
confirmatory sample6.
Following developmental sample analysis all significant or marginally significant
variables were input into separate Multinomial Logistic Regression models for the men and
women and run on the confirmatory sample7. Only two variables resulted in significant
discrimination between groups in the female confirmatory sample (see Figure 1). Interpretation
of these results reveals that in comparison to Non-reporters, individuals who only reported some
of the harassment they experienced were more likely to endorse experiencing sexist behaviors
and sexual coercion. Additionally when compared to Complete Reporters, Partial Reporters were

6
Despite significance results Gender Harassment, a combination of sexual and sexist behaviors, is omitted from the
training sample results and confirmatory sample model due to its redundancy and the author’s interest in
understanding its components.
7
Marital Status was erroneously entered into the model used in the confirmatory sample and is therefore omitted.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
243

less likely to indicate they had experienced sexist behaviors but more likely to endorse
experiencing sexual coercion.
Theoretically the results differentiating the Non-reporters from the Partial Reporters
make sense as one would expect a Non-reporter to experience less harassment than someone who
reports these behaviors. However, the comparison between the group of Complete Reporters and
the Partial Reporters is somewhat less clear. One might argue that sexual coercion is a more
severe form of harassment, owing to its inherently threatening nature, and therefore would be
more likely to lead to reporting and that sexist behavior, a more minor offense, would be less
likely to lead to such an end result. Yet it appears that it is sexist behavior, rather than sexual
coercion, that may have driven the group of Complete Reporters to report their harassment (see
Figure 1).

Sexist Behavior
● Non-Reporters < Partial Reporters < Complete Reporters

Sexual Coercion
● Non-Reporters < Partial Reporters > Complete Reporters

Figure 1. Female Confirmatory Three-Group Comparison of Significant Variables.

Frequency analysis of the Sexual Experiences Questionnaire reveals that Partial and
Complete Reporters equally endorse experiencing sexual coercion but the Partial Reporters
appear more likely to list that behavior as part of the one situation that had the greatest effect on
them. Therefore it does not appear that individuals in the Partial Reporting category have the
most severe experience of harassment. Hypotheses as to the cause of this result are difficult in
that the survey asked participants to indicate the behaviors they had experienced and did not ask
which behaviors they had reported. Assuming the Partial Reporters actually reported the sexual
coercion could lead to hypothesize that their experience of sexual coercion was so horrific as to
warrant reporting whereas other behaviors were viewed as minor incidents and went unreported.
An examination of the Multinomial Logistic Regression results for the male 3 group
confirmatory sample reveals similarly interesting results. When compared to Non-reporters it
appears that Partial Reporters are more likely to endorse experiencing more frequent harassment
and sexist behaviors (see figure 2).

Frequency
● Non-Reporters < Partial Reporters > Complete Reporters

Sexist Behavior
● Non-Reporters < Partial Reporters

Supervisor Harassment
● Partial Reporters < Complete Reporters

Figure 2. Male Confirmatory Three-Group Comparison of Significant Variables.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
244

Although less is known about male responses to sexual harassment this result seems
logical in that one would expect individuals who experience more harassment and more frequent
harassment to be more likely to report that behavior. As with the female sample, however, the
comparison between the Complete Reporters and the Partial Reporters that is somewhat more
ambiguous. In comparison to the Complete Reporters, the men in the Partial Reporters group are
more likely to report experience more frequent harassment and less likely to report being
harassed by a supervisor or multiple supervisors. Perhaps while the harassment is less frequent
for men in the Complete Reporters group than in the Partial Reporters group it is the recognition
of the harassment perpetrated by a supervisor that is identified by the target as worthy of
reporting. Whereas for the Partial Reporters it is the frequency with which a behavior occurs that
results in the behavior being reported. The implication is that men in the Partial Reporting
category appear to have the most severe experience of harassment in terms of
at least frequency of harassment.
The original sample was later split into two groups in order to better understand
the impact of combining individuals in the Partial Reporters groups with either the Non-reporters
or Complete Reporters on our interpretation and understanding of the determinants of reporting
sexual harassment. Individuals who reported none of the harassment they experienced comprised
the Non-reporters group and individuals who reported some or all of the harassment that
occurred to them were placed in the Reporters group. As with the previous analyses, Multinomial
Logistic Regressions8 were run separately for women and men on both the developmental and
confirmatory samples using the same set of initial variables. Only those variables that, when
alone in the model, resulted in significant (≤.05) or marginally significant (≤ .15) differentiation
between the two groups were kept in the model for inclusion in testing using the confirmatory
sample9.
Comparison between the Non-reporters and Reporters in the confirmatory 2-group female
sample resulted in two significant differentiating variables (see Figure 3). When compared to the
Non-reporters, Reporters were more likely to endorse experiencing sexist behaviors and sexual
coercion. Intuitively this appears correct in that the more one experiences these types of negative
behaviors the more likely one would be to report them. However, in light of the findings in the 3-
group female sample it becomes clear that a dichotomous based analysis provides a molar view
of the phenomena on reporting rather than a deeper molecular understanding. Not only does the
2-group analysis fail to provide us with much more than is already accessible through common
sense it also confounds our understanding of the role these two variables, sexist behavior and
sexual coercion, play in reporting sexual harassment.

Sexist Behavior
● Non-Reporters < Reporters

Sexual Coercion

8
Multinomial Logistic Regression was utilized instead of (Binomial) Logistic Regression for comparative purposes.
Use of this method should not alter the results or interpretation of the analyses.
9
Despite significance results Gender Harassment, a combination of sexual and sexist behaviors, is omitted from the
training sample results and confirmatory sample model due to its redundancy and the author’s interest in
understanding its components.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
245

● Non-Reporters < Reporters

Figure 3. Female Confirmatory Two-Group Comparison of Significant Variables.

As in the 3-group comparison results between the male sample, and incidentally in the
female sample as well, of Partial and Complete Reporters a review of the 2-group analysis
revealed a similar set of variables significantly differentiating the comparison groups (see Figure
4). Examination of the 2-group confirmatory male sample reveals that when compared to Non-
reporters, Reporters are more apt to indicate experiencing sexist behaviors and to endorse being
harassed by one or more supervisors. It is apparent from the differences seen between the male
and female models that male reporting of sexual harassment does not follow the logic typically
utilized in discussions of female reporting of sexual harassment.

Sexist Behavior
● Non-Reporters < Reporters

Supervisor Harassment
● Non-Reporters < Reporters

Figure 4. Male Confirmatory Two-Group Comparison of Significant Variables.

Following the above two and three group comparisons additional analyses were
conducted in an effort to better understand targets reasons for not reporting the harassment
experience(s). Frequency analyses (see Table 1) for both women and men suggest that the most
frequent reasons for not reporting the harassment are that the behavior was not important enough
to report and that the individual took care of the problem by herself or himself.

Table 1
Frequency of reasons for not reporting harassment.
Reason for not reporting Women Men
Was not important enough to report 59.9 74.3
You did not know how to report 11.1 8.7
You felt uncomfortable making a report 32.1 20.2
You took care of the problem yourself 57.4 60.0
You talked to someone informally in your chain-of-command 19.5 12.4
You did not think anything would be done if you reported 27.4 20.6
You thought you would not be believed in you reported 14.5 8.7
You thought your coworkers would be angry if you reported 19.5 14.9
You wanted to fit in 15.3 12.0
You thought reporting would take too much time and effort 19.1 16.0
You thought you would be labeled a troublemaker if you reported 27.5 16.6
A peer talked you out of making a formal complaint 3.1 1.5
A supervisor talked you out of making a formal complaint 3.0 1.5
You did not want to hurt the person’s or persons’ feelings, family, or career 22.6 16.8
You were afraid of retaliation or reprisals from the person(s) who did it 18.4 10.0
You were afraid of retaliation or reprisals from friends/associates of the 13.2 7.9

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
246

person(s) who did it


You were afraid of retaliation or reprisals from your supervisors or chain-of- 12.6 8.1
command
Some other reason 18.9 16.0

Further analyses were conducted to explore the possible factor structure of the above-
mentioned reporting items. Exploratory factor analyses using principal axis factoring with
varimax rotation suggests that these items do not form a cohesive scale structure (Ormerod,
Lawson, Sims, Lytell, Wadlington, Yaeger, Wright, Reed, Lee, Drasgow, Fitzgerald, and
Cohorn, 2002). However cluster analysis suggests the presence of two interpretable clusters and
a third clustering (Importance) of the two most frequently endorsed items (“was not important
enough to report” and “you took care of the problem yourself”). The first cluster (Other) is
difficult to interpret in that it is a compilation of items regarding lack of knowledge, being talked
out of reporting, and other seemingly dissimilar items. The second cluster (Negative Outcome) is
easier to interpret and contains items that suggest the target feared there would be negative
repercussions should they report the harassment. Scales based on these clusters were created and
subsequent alpha coefficients for women, men and the total sample ranged from .45 to .89.
Alpha coefficients for all items in the reasons for not reporting measure range from .80 to .81
(see Table 2).

Table 2
Alpha Coefficients for items in the Reasons for Not Reporting measure.
Women Men Total Sample
All Items (74A-S) .80 .81 .80
Cluster 1 (Other) .45 .51 .47
Cluster 2 (Negative Outcome) .89 .89 .89

Frequency analyses on SE and SEQ scores for men and women were then conducted for
those individuals who endorsed only fear-based reasons (Negative Outcome) for not reporting.
Approximately 1% of men (N=13) and 2% of women (N=84) indicated only fear-based reasons
for not reporting. Items endorsed in both the One Situation with the Greatest Effect (SE) and
Sexual Experiences Questionnaire (SEQ) were summed for these individuals. The average SEQ
scores for men and women in this group were relatively high (men= 5.23 and women=10.93)
compared to SEQ scores from all respondents regardless of reason(s) for not reporting
(men=4.79 and women=8.66). The average SE scores for men and women were relatively high
(men=3.10 and women=3.79) compared to SE scores from all respondents regardless of
reason(s) for not reporting (men= 2.14 and women=3.59). It appears then that individuals who
indicate solely fear-based reasons for not reporting endorse experiencing more harassing
behaviors.
Approximately 1% of men (N=8) and 1% of women (N=29), and incidentally
only individuals in the Non-Reporter group, indicated that the only reasons that they did not
report were that they either took care of problem by themselves or it was not important enough to

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
247

report (Other). Items endorsed in both the One Situation with the Greatest Effect (SE) and Sexual
Experiences Questionnaire (SEQ) were summed for these individuals. The average SEQ scores
for men and women were relatively low (men= 3.88 and women= 4.30) compared to SEQ scores
from all male and female respondents regardless of reason(s) for not reporting (men=4.79 and
women=8.66). The average SE scores for men were somewhat high and for women were
relatively low (men= 2.50 and women=2.33) compared to SE scores from all respondents
regardless of reason(s) for not reporting (men= 2.14 and women=3.59). This suggests that, at
least for the women, the harassment was less severe and perhaps not severe enough to warrant
reporting.
Further analyses show that men and women typically indicate more than one reason for
not reporting the harassment experience (see Table 3). The most frequent response pattern for
women and men was a combination of responses to the Other, Negative Outcome, and
Importance groups. Again hypotheses as to the cause of this result are difficult in that the survey
asked participants to indicate the reasons they did not report but did not ask the participants to
identify the behavior(s) associated with those reasons.
Table 3
Individuals who endorsed combinations of Other, Negative Outcome, and Importance.
Women Men Total
Other & Negative Outcome & Importance 1262 381 1643
Negative Outcome & Importance 225 109 334
Negative Outcome & Other 366 76 442
Importance & Other 569 250 819
DISCUSSION
Analyses of reporting status generally support the classification of individuals into groups
of Non-Reporters, Partial Reporters and Complete Reporters as significant group differences
were found in the self-reported experiences of these individuals. Although group differentiation
was not always found in stimulus, organizational, and individual variables all three variable
types were found to be significant in either the developmental and/or confirmatory analyses. It
appears that stimulus factors played a consistent role in group differentiation for both men and
women across developmental and confirmatory analyses. In particular the presence of sexist
behavior(s) contributes significantly to group differentiation for both men and women in all
analyses. This is likely due to the higher frequency, as compared to other forms of harassment,
with which these behaviors occur. Further, analyses suggest that individuals’ reasons for not
reporting the harassment typically include a combination of both fear and non-fear-based reasons
for not reporting.
Implications
If we are to ever fully understand the determinants of reporting sexual harassment we
must first acknowledge that research on reporting sexual harassment may require the inclusion of
exploratory analyses. Reliance on hypotheses regarding the determinants of reporting sexual
harassment that neglect influential variables dictated by previous research will likely result in
incomplete answers. Researchers cannot know prior to conducting their work, which variables
will play a statistically significant role in reporting harassment nor should they assume the
importance of the same variables from one year to the next as the harassment experience is apt to
differ from one time period to the next. This is evident in the differing findings of this research

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
248

and that of Bergman et al, 2002 and Malamut et al., 2001. All three analyses are based on
surveys of the same organization, although the current project utilizes a more recent dataset, and
yet they all reveal differing results in essence validating the need to be more inclusive of
variables in our hypotheses, or possible methodological problems, and providing insight into the
impact of the passage of time, perhaps due to policy and procedural changes, on results. An
extension of this argument would be that this type of research should be conducted not only on a
year-to-year basis but also on an organizational basis, as women in different organizations are
likely to have differing experiences of sexual harassment in very different contexts. The
workplace is likely a dynamic entity wherein changes occur within and across time.
Regardless of whether or not researchers accept the benefits of exploratory research on
this topic it should be understood that creating dichotomous reporting categories that fail to
either take into account the existence of Partial Reporters or merges them into either the group of
Non-Reporters or Complete Reporters blurs our understanding of the phenomena. This research
shows that analyses of two and three-group samples result in a similar set of discriminating
variables that when further explored in three-group analyses begin to broaden our understanding
of their importance.
Further, examination of target response behavior by a trichotmous grouping has
implications for sexual harassment litigation. This research suggests that targets are almost
equally likely to report all of the experienced as they are to report only some of the harassment.
Litigation outcomes often rely heavily on the behavior of the target and a lack of reporting either
all of some of the unwanted experience(s) may lead to summary judgment or an unfavorable
verdict. Expert testimony regarding the nature of reporting behavior that includes information
relevant to the normality of partial reporting will likely enhance the credibility of complainant’s
actual reporting behavior, whether she reported none or some of the experience.
Limitations
As previously discussed, a limitation of the current study lies in the lack of information
regarding the specific behaviors reported or not reported by participants could supplement our
understanding of reporting behavior. Analyses of this data could be used to ascertain the
relationship between reporting/non-reporting and the harassing behavior(s). It is also unfortunate
that this research was unable to incorporate a variable specific to the labeling of a behavior as
previous research suggests it plays an important role in determining whether or not a behavior is
reported. Although a labeling item was included in the survey it was omitted from analyses, as it
was not specifically associated with the One Situation with the Greatest Effect but rather to all of
the harassment experienced. Due to its nature, it is likely that targets would be more apt to label
events associated with the One Situation with the Greatest Effect as sexual harassment.
However, this assumption could not be tested in the current survey format.
An examination of this type of data may provide researchers and organizations with
valuable information that could assist them in effectively encouraging reporting. Enactment of
changes, based on target’s responses, may serve to protect individuals from physical and
psychological harm and also protect an organizations investment in their employees by possibly
increasing job satisfaction, decreasing attrition, strictly enforcing anti-retaliation policy and
positively impacting other negative outcomes of reporting harassment. Further this data can be
used to support litigants concerns regarding reporting sexual harassment and perhaps encourage
the courts to require organizations to prove that they not only provide but also strictly enforce

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
249

policy and procedure relevant to harassment while protecting complainants from negative
repercussions due to reporting.
Additionally, limitations may arise due to the self-report nature of the data. Analyses of
collaborative data, such as assessment of formal reports, and comparisons to self-report items
could lend support for the trichotomization of reporting status. Further the unnatural structure of
item response scaling or unintended content ambiguity may distort participant responses that are
likely undetected in the absence of interview assessments. Added limitations may arise due to the
dichotomous nature of scaling, utilized in several items, which diminishes the amount of
response variability and could therefore impact outcomes using multinomial logistic regression.
The utilization of such a large sample size mediates these limitations but could also serve to
generate more significance outcomes that would otherwise be found. As such MANOVA’s and
other statistical analyses that are sensitive to sample size were avoided. The lack of a multitude
of significant outcomes supports the appropriate use of the statistical analyses.
Future Directions
Future directions for this research include comparing the current study with data from the
1995 Department of Defense Work and Gender Relations survey. Of particular interest in this
comparison is the ability to test the hypothesized role of labeling in reporting sexual harassment
as the 1995 survey included a labeling item specific to the One Situation with the Greatest
Effect. Significant group differentiation, in the 1995 survey, by the labeling variable would have
implications for the results of the current study. Additionally, this comparison could shed light
on the dynamic nature of the reporting sexual harassment within a military context as differences
between time periods can be assessed.
Further analyses should focus on the role of organizational variables as determinants of
reporting sexual harassment. The findings from the current study regarding the importance of
organizational variables contradicts results from work on the 1995 Department of Defense Work
and Gender Relations survey. It is possible to infer that these contradictions exist due to the
passage of time (e.g. sexual harassment policy/procedural changes over time), different statistical
analyses, or due to differences in construct operationalization. Comparison of the 1995 and 2002
data sets could help clarify these findings as the research could employ uniformity in construct
operationalization and method of data analysis. Additionally, as the importance of organizational
climate on reporting in a military context is unclear participant’s self-report data regarding
organizational reasons for non-reporting can be compared to responses on the Organizational
Tolerance for Sexual Harassment scale (OTSH; Hulin, 1993) thereby comparing the usefulness
of scenario versus self-report data collection on the perception of climate in the military.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
250

References
Bergman, M.E., Langhout, R.D, Palmieri, P.A., Cortina, L.M., & Fitzgerald, L.F. (2002).
The (Un)reasonableness of reporting: Antecedents and consequences of reporting
sexual harassment.
Bernaards, C. A. & Sijtsma, K. (2000). Influence of imputation and EM methods on
factor analysis when item nonresponse in questionnaire data is nonignorable.
Multivariate Behavioral Research, 35, 321-364.
Brooks, L., & Perot, A.R. (1991). Reporting Sexual Harassment: Exploring a Predictive
Model. Psychology of Women Quarterly, (15), 31-47.
Fitzgerald, L.F, Drasgow, F., Hulin, C.L., Gelfand, M.J., & Magley, V.J. (1997).
Antecedents and consequences of sexual harassment in Organizations: A Test of
an Integrated Model. Journal of Applied Psychology, 82(4), 578-589.
Fitzgerald, L.F, & Swan, S. (1995). Why Didn’t She Just Report Him? The Psychological
and Legal Implications of Women’s Responses to Sexual Harassment. Journal of Social
Issues, 51(1), 117-138.
Hulin, C.L. (1993). A framework for the study of sexual harassment in organizations:
Climate, stressors, and patterned responses. Paper presented at a Symposium on Sexual
Harassment at the Society of Industrial and Organizational Psychology, San Francisco,
CA.
Hulin, C. L., Fitzgerald, L. F., & Drasgow, F. (1996). Organizational influences on sexual
harassment. In M. Stockdale (Ed.), Sexual harassment in the workplace, Vol. 5, (pp. 127-
150). Thousand Oaks, CA: Sage.
Knapp, D.E., Faley, R.H., Ekeberg, S.E., & DuBois, C.L.Z. (1997). Determinants of
Target Responses to Sexual Harassment: A Conceptual Framework. Academy of
Management Review, 22(3), 687-729.
Malamut, A.B., & Offermann, L.R. (2001). Coping with sexual harassment: personal,
environmental, and cognitive determinants. Journal of Applied Psychology, 86
(6), 1152-1166.
Marin, A.J, & Guadagno, R.E. (1999). Perceptions of sexual harassment victims as a
function of labeling and reporting. Sex Roles, 41(11/12), 921-940.
McKinney, K., Olson, C.V., & Satterfield, A. (1988). Graduate student’s experiences
with and response to sexual harassment. Journal of Interpersonal Violence, 3(3),
319-325.
Offerman, L.R., & Malamut, A.B. (2002). When Leaders Harass: The Impact of Target
Perceptions on Organizational Leadership and Climate on Harassment Reporting
and Outcomes. Journal of Applied Psychology, 87(5), 885-893.
Peirce, E.R., Rosen, B., & Hiller, T.B. (1997). Breaking the Silence: Creating User-
Friendly Sexual Harassment Policies. Employee Responsibilities and Rights
Journal, 10(3), 225-242.
Perry, E.L., Kulik, C.T., & Schmidtke, J.M. (1997). Blowing the Whistle: Determinants
of Responses to Sexual Harassment. Basic and Applied Social Psychology, 19(4),
457-482.
Rudman, L.A., Borgida, E., & Robertson, B.A. (1995). Suffering in Silence: Procedural

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
251

Justice Versus Gender Socialization Issues in University Sexual Harassment


Grievance Procedures. Basic and Applied Social Psychology, 17(4), 519-541.
Survey Method for Counting Incidents of Sexual Harassment (April 28, 2002).
Washington, DC: Office of the Under Secretary of Defense for Personnel and Readiness.
U.S. Merit Systems Protection Board. (1994). Sexual harassment in the federal
workplace: Trends, progress and continuing challenges. Washington, DC; US.
Government Printing Office.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
252

Using Stakeholder Analysis (SA) and the Stakeholder


Information System (SIS) in Human Resource Analysis

Kimberly-Anne Ford
Directorate of Strategic Human Resources (DstratHR)
Department of National Defence (DND) Canada
Ford.KA@Forces.gc.ca

45th Annual Conference of the


International Military Testing Association (IMTA)
November 05, 2003
Pensacola Beach, Florida

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
253

Table of Contents

Table of Contents............................................................................................................ 253

List of Figures ................................................................................................................. 254

1.0 Introduction.............................................................................................................. 255

1.1 Purpose................................................................................................................. 255

1.2 Background .......................................................................................................... 255

1.2.1 The Need for Effective Consultation Frameworks ....................................... 256

1.2.2 The Value of Participatory Methods............................................................. 256

1.2.3 The Utility of SA .......................................................................................... 257

2.0 How could SA be Used in Human Resource Analysis? .......................................... 258

2.1 Stakeholder Brainstorming .................................................................................. 259

2.2 Stakeholder Influence and Importance ................................................................ 261

2.3 Stakeholder Salience............................................................................................ 262

3.0 Overview of The SIS ............................................................................................... 264

3.1 What is the ‘Stakeholder Information System’?.................................................. 264

3.2 Using the SIS ....................................................................................................... 265

3.2.1 The SIS Online.............................................................................................. 267

4.0 Discussion: Potential Applications of the SIS in Human Resource Analysis ......... 268

4.1 Strategic Human Resources ................................................................................. 268

4.2 Quality of Life Research...................................................................................... 268

5.0 Conclusion ............................................................................................................... 269

References....................................................................................................................... 270

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
254

List of Figures

Page No.

Figure 1: Basic Stakeholder Matrix 259

Figure 2: Stakeholder Importance and Influence Matrix 261

Figure 3: Representing Stakeholder Saliency 263

Figure 4: The Stakeholder Information System Website 266

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
255

1.0 Introduction

1.1 Purpose

This paper provides an overview of Stakeholder Analysis (SA) – a social science


methodology -- and the ‘Stakeholder Information System’ (SIS) – a software package that is
presently under development at Carleton University, in Ottawa, Canada. SA is used to
understand relationships between various stakeholders or stakeholder groups; to determine and
obtain adequate levels of participation from each stakeholder group involved in a research
project or activity; and to ascertain their potential influence over the process. The SIS is
currently being developed and tested by researchers at Carleton University. In its present form,
the SIS is an online resource that contains information on a wide array of research methods and
techniques10, which are used to conduct SA. SA and the SIS have applications for social science
researchers or human resource analysts who want to share information or consult with a wide
range of stakeholders. The usefulness of SA and the SIS for creating effective consultation and
communication frameworks for research and knowledge-sharing activities within the Department
of National Defence (DND) will be addressed in depth in the following paragraphs. The paper
addresses the utility of SA and the SIS for social science research and human resource activity
within DND, and answers the following questions:

• What is Stakeholder Analysis (SA)?

• How does the Stakeholder Information System (SIS) facilitate SA?

• In what ways are the SIS and SA relevant to Human Resource Analysis?

1.2 Background

The impetus for using the SIS within DND is grounded in three inter-related factors: first,
the need for effective consultation and knowledge-sharing frameworks in DND and in
government research in general; second, the value of Participatory Action Research (PAR) to
accomplish this task; and third, the utility of SA for adding depth and rigour to PAR. Each of
these is addressed in the following paragraphs.

10
Access in this form is presently free on the World Wide Web, see:
http://www.carleton.ca/~jchevali/STAKEH.html; the system will soon be made available by subscription.
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
256

1.2.1 The Need for Effective Consultation Frameworks

The need for effective consultation models and information sharing throughout the
Department of National Defence is widely acknowledged. For example, the Military HR
Strategy 2020 states that:

“We must ensure that a continuous, effective internal communications network is


established… We must continually improve the effectiveness of our internal
communications strategy to ensure that all members are aware of HR issues… We must
maintain an effective consultation process that shares expertise within the department,
nationally and internationally” (DND, 2002: 20 -21).

The importance of effective communication in DND was also underlined in the 2002 Chief of
the Defence Staff’s Annual Report, stating that:

“Effective communication is an essential part of the modern military. Communications is


a force multiplier on operations. It is also a vital tool in nurturing our relationship with
Canadians and strengthening public awareness and public understanding of the relevance
of the CF, as well as our issues and challenges… As the CF grapples with the demands of
adapting to our changing geo-strategic environment, it is more important than ever to
explain our issues, priorities, and decisions to CF members” (CDS, 2002).

SA can also serve to actualise the government-wide policy objective of increasing


transparency, as mentioned in the 2003 Speech From The Throne. SA and the SIS provide
researchers with the theoretical knowledge and concrete techniques required to meet the
challenge of ensuring transparency, accountability and engagement, to ultimately create a
learning process in which all parties can benefit from the exchange of knowledge.

1.2.2 The Value of Participatory Methods

Increasingly, our research unfolds through the formation of partnerships with various
groups and individuals, and the research process is largely one that necessitates a mutual
exchange of information, in which all parties learn from the process. Therefore, traditional
methods of objective research – in which a researcher goes into the field to extract data from
subjects – are not always sufficient to meet our objectives. SA is an in-depth, analytically
rigorous form of participatory research. The value of ‘Participatory Action Research’ (PAR) for
use within the Department of National Defence, especially as it pertains to quality of life
measurement among CF personnel and their family members, has been described elsewhere
(Ford, 2001).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
257

The Department of National Defence already has a history of conducting research that is
‘participatory’ in nature (Ford, 2001: 20). For example, the Standing Committee On Defence
and Veterans Affairs (SCONDVA) Inquiries and the PERSTEMPO project (as defined by
Flemming, 2000) can both be seen as examples of participatory research: participants were
consulted early on in the research processes and asked to identify important issues. In fact, DND
researchers and human resource analysts often solicit input from various participants, or CF
stakeholders, especially in the planning stage of a research process or knowledge-sharing
activity. However, there is little follow-through in their participation. As a research project
unfolds, participants are often consulted at the onset of a project or initiative, but they are rarely
ever consulted with again over the course of the project, to discuss project outcomes or to
disseminate results. In sum, the importance of stakeholder involvement in planning a research
project is widely acknowledged and often practiced in DND, but the importance of following
through with participation during the whole course of a research process is not. This is at times
necessitated by the decision making process that follows a research endeavour, and/or it can be
due to time limitations, financial constraints, and/or a lack of knowledge of existing participatory
techniques that can facilitate the exchange of information throughout a research process.
Furthermore, it is difficult to find a comprehensive list of participatory techniques and guidelines
for their use in the existing literature. SA and the SIS fill the gaps in the participatory literature,
by providing methodological techniques to determine and elicit the appropriate level of
participation required by various stakeholders at different stages of a project.

1.2.3 The Utility of SA

In order to properly evaluate a proposed project, to gain the viewpoints of many parties,
and/or to anticipate and mitigate potential conflicts, it is vital to identify the groups and
individuals who will be affected by, and therefore have an interest in, project outcomes. Key
questions to ask are:

• Who are all the players involved in or impacted by the project?

• How important are those individuals or groups to project success?

• How influential can they be over before, during or after project completion?

• How can they become involved or informed of the project?

SA provides some answers to these questions.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
258

Originally a prospecting term (a stakeholder being someone who holds a gold claim), SA
has its contemporary roots in the field of management. It is widely used in social science
research today, to identify all social groups impacted by a research process, and to assess their
relative influence and importance based upon the criteria identified for project success. A SA
allows researchers to identify the various individuals, groups and organisations that are
“significantly affected by someone else’s decision making activity” (Chevalier, 2001: 1). The
United Kingdom’s Department for International Development (DFID) defines SA as: “the
identification of key stakeholders, an assessment of their interests, and the ways in which these
interests affect project viability and riskiness.” (DFID, 1995: 1). A defining feature of SA is that
it forces researchers to think through the numerous levels of impact a project may have on the
diverse stakeholder groups: to differentiate between indirectly and directly, and positively and
negatively affected stakeholders; to examine levels of influence and importance; and to uncover
stakeholder saliency. The following pages describe the basics of SA and provide the reader with
some rudimentary analytical techniques.

2.0 How could SA be Used in Human Resource Analysis?

The SIS has numerous potential applications for social science research and human
resource analysis within DND. Three examples of such will be presented in the discussion
section of this research note. In order to allow the reader to capture how SA could be used in
DND research, the methodological techniques presented in the following pages are discussed in
reference to the following fictitious11 research scenario: You are at the planning stage of a
research project aimed at modernizing or transforming the services offered in Military Family
Resource Centres (MFRC), in light of changing definitions of ‘the military family’. You would
soon realise that a number of ‘key players’ – important civilian and military stakeholder groups -
- across Canada are already involved in activities that relate to your project. You would also
foresee numerous ways in which your project could impact upon various individuals or groups.
SA would allow you to plan a research project that captures the needs of MFRC clients, service
providers, and other stakeholders, and to anticipate and mitigate potential conflicts that might
arise over the course of your project. Moreover, SA could be used to ensure the co-operation of,
and knowledge sharing between many of the ‘key players’ involved in or impacted by MFRC
service delivery. SA and the SIS provide researchers with the theoretical understanding and
practical techniques needed in order to get to know key stakeholders and to develop appropriate
communication and consultation frameworks required to effectively share knowledge among all
of them.

11
This is fictitious example is used for the sole purpose of demonstrating SA techniques; there are presently no such
plans for MFRC modernisation.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
259

2.1 Stakeholder Brainstorming

The first step in conducting a basic SA is to brainstorm in order to create an exhaustive


list of potential stakeholders. A list of stakeholders generated for the fictitious ‘MFRC
Modernisation’ -- outlined in the introduction -- might include the following:
• researchers working in the Directorate of Quality of Life
• military service providers working in MFRCs across Canada (such as social workers, and
support staff)
• military members
• the ‘loved ones’ or family members of military personnel
• commanding officers
• civilian service providers
• members of the civilian population who are responsible for some aspect of a military
child’s care: for example school teachers, daycare workers
• extended family members
• human resources policy analysts

Once this list is complete, stakeholders can be categorised in a basic stakeholder matrix –
a rudimentary tool used to differentiate between potential stakeholders who are positively or
negatively and directly or indirectly influenced. Imagine that one proposed action in the MFRC
modernization project is the closure of an MFRC. Figure 1 below, presents a sample basic
stakeholder matrix that has been filled out accordingly.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
260

Proposed
Positively- Negatively-
Action:
Affected Affected
The Closure of a Stakeholders Stakeholders
Military Family
Resource Centre
(fictitious)

Local daycare owner MFRC staff --


Directly- (will gain business) social workers,
Affected administrators, etc
Stakeholders (who will be out of
work)

Commanding Officers
Indirectly- DND Budgeting (will have more
Affected Officers requests for family
Stakeholders (will have more money leave)
to spend on other items)
Local School Teachers
(will see more
behavioural problems in
children of deployed CF
members)

Extended Family
Members (will be called
upon to take care of
children)

Figure 1: Basic Stakeholder Matrix


(Adapted from Chevalier, 2001)

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
261

2.2 Stakeholder Influence and Importance

The next step in a basic SA is to determine whether each stakeholder, or stakeholder


group ranks high or low on a scale of importance and influence, according to the following
definitions: Important stakeholders are critical to project success, in other words, the project is
meant to serve their interests in some way. For example, clients or military families or patients
are important stakeholders in a health care reform project. Influential stakeholders, on the other
hand, have the means to exert their will and influence project outcomes. MFRC administrators,
budget allocation officers, or commanding officers are all examples of influential stakeholders in
a MFRC modernisation. The influence and importance of diverse stakeholder groups in the
MFRC modernization project, can be mapped out as shown in Figure 2.

Proposed Action: Modernising the MFRC (fictitious)


High

I - CF Members - MFRC
m and their Service
p Loved ones Providers
↑ o
r
t
a
- MFRC
n
- Civilian Social Administrators
c
e Workers - DND Budget
Allocators
Low
Low High
Influence

Figure 2: Stakeholder Importance and Influence Matrix
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
262

2.3 Stakeholder Salience

SA also allows researchers to map out power differentials and to determine saliency
among stakeholder groups – i.e. the relevance of stakeholder goals and objectives to those of the
research project. According to Chevalier, (2001) contemporary understandings of stakeholder
saliency encompass three elements: power, interest (or urgency), and legitimacy. While
‘legitimacy’ in its different forms is an important variable, two other factors must not be ignored
when determining the relevance of stakeholder claims to project objectives. Power, defined as
“the ability to influence the actions of other stakeholders and to bring out the desired
outcomes”... can be actualized “through the use of coercive-physical, material-financial and
normative-symbolic resources at one's disposal”. The other factor is that of interest, which
relates in part to the ability of stakeholders to impress the critical and pressing character of their
claims or interests. Chevalier remarks that these three attributes are transient and have a
cumulative effect on salience:

“[They] are highly variable; they are socially constructed; and they can be possessed with
or without consciousness and willful exercise. They can also intersect or be combined in
multiple ways, such that stakeholder salience will be positively related to the cumulative
number of attributes effectively possessed” (Chevalier, 2001).

To further assess the various stakeholders’ places in the research process, they can be
categorised into the following groups:
• dormant stakeholders only have power (in the MFRC Modernisation, one example could
be DND Budget Allocators);
• discretionary stakeholders only have legitimacy (in the MFRC Modernisation, one
example could be the Minister of National Defence);
• demanding stakeholders only have strong interest (in the MFRC Modernisation, one
example of a ‘demanding stakeholder’ could be CF Loved ones who do not fit into
traditional definitions of the ‘military family’);
• dependant stakeholders have legitimacy and interest (in the MFRC Modernisation, one
example of a ‘dependant stakeholder’ could be CF ‘military families’);
• dominant stakeholders have power and legitimacy (in the MFRC Modernisation, one
example of a ‘dominant stakeholder’ could be Commanding Officers);
• dangerous stakeholders have interest and power (in the MFRC Modernisation, one
example of a ‘dangerous stakeholder’ could be civilian social workers); and
• definitive stakeholders have legitimacy, power and urgency (in the MFRC Modernisation,
one example of a ‘definitive stakeholder’ could be DQOL researchers).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
263

Figure 3 graphically represents stakeholder saliency, the relevance of stakeholders’ claims to


overall project success, in the MFRC Modernisation project.Proposed Action: Modernising the
MFRC (fictitious)

DQOL

Figure 3: Representing Stakeholder Saliency


(Adapted from Chevalier, 2001)
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
264

In sum, SA is done to identify all groups or individuals affected by, or involved in, a
research process or knowledge-sharing activity. It encourages project managers to draw out and
identify the interests of stakeholder groups in relation to the project objectives, to assess
stakeholder saliency, to identify potential conflicts of interest between stakeholder groups, to
build upon existing relationships and form new networks between stakeholders, and to assess the
appropriate type of participation by different stakeholders at successive stages of the research
cycle (DFID, 1995: 3). Taking SA at its core, the SIS allows researchers to select the appropriate
participatory techniques to use, depending upon the project objective.

3.0 Overview of The SIS

3.1 What is the ‘Stakeholder Information System’?

The SIS is an International Development Research Council of Canada (IDRC) – funded


initiative. It is important to note that the system is presently under development. First and
foremost, the SIS is a research process management system, allowing researchers to identify and
to complete a variety of research objectives, while informing or enlisting all relevant
stakeholders in the tasks at hand. The SIS contains a comprehensive listing of over seventy-five
participatory research techniques – surveys, focus groups, steering committees, value-structured
referendums, open house sessions, visioning sessions, and historical timelines – that are all
defined in-depth, and accompanied by suggestions for their use, links to Internet resources and
readings, and organised according to the research objective to which they pertain. The creators
of the SIS define the system as:

“[The SIS] … offers flexible techniques to analyze the social aspects of conflict, problem,
project or policy management activities. The methodology proposes ways of doing social
analysis with the active involvement of the parties concerned, i.e., actors, groups,
constituencies or institutions that can affect or be affected (adversely or positively) by a
given set of problems or interventions” (Chevalier, and de la Gorgendière, 2003)12.

12
The SIS will soon be made publicly available through a user-friendly CD-ROM and an
interactive web site, both of which are presently under construction and being field-tested with
development workers in Africa and in South America. The SIS can now be acquired through
workshops, and a prototype of the SIS is available online for free. This web-based version is
non-interactive, but provides researchers with comprehensive definitions of participatory
techniques in downloadable PDF files, and links to a wide array of Internet resources. Interested
parties can visit the web site to become familiar with a wide array of participatory techniques:
http://www.carleton.ca/~jchevali/STAKEH.html

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
265

The SIS is a ‘research process management’ system; as such, it is used to guide


researchers from the ‘problem definition’ stage of a project, to the dissemination of results. The
system allows project leaders to outline objectives, define their methodological approaches,
identify the stakeholders, and plan out their use of resources. Once this initial planning stage is
completed, researchers can proceed through various stages of the project, identifying objectives
and selecting from the appropriate research techniques offered. The system itself can serve as a
knowledge-sharing tool, and can provide a record of the various research or knowledge-sharing
activities accomplished throughout the research process. The creators of the SIS state that: “the
SIS techniques are divided into interlocking modules (Pointers, Problems, Players, Profiles,
Positions, Paths) designed to be ‘scaled up’ or ‘scaled down’ according to project needs”
(Chevalier, and de la Gorgendière, 2003). In this case, “scaling up” or “scaling down” the
techniques refers to adding simplicity or complexity, depending upon the stakeholders involved
and level of analysis required in any given research activity

3.2 Using the SIS

The SIS is designed to lead researchers through a process of identifying project


objectives, outlining the resources available to complete various stages of the research process
and then to select from a variety of participatory techniques. Researchers and stakeholders can
keep track of progress made in various areas of the research project by referring back to the
system. Figure 4 below, gives the reader a sense of the organisation of the system in its current
state. It shows the display of nodes, modules and technique files that appear on the SIS website.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
266

Figure 4: The SIS Website


http://www.carleton.ca/~jchevali/STAKEH.html

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
267

3.2.1 The SIS Online

As already mentioned, the online version of the SIS is only a prototype, and is not yet
interactive. However, researchers involved in participatory research or knowledge-sharing
activities can now visit the SIS website online to gain information on a wide range of
participatory techniques. For example, a number of alternatives to the traditional meeting are
described in depth, including: caucusing; citizens’ jury; planning cell; consensus conference;
deliberative polling; open space meeting; workshop; roundtable; public hearing; open house;
visioning session; working group; and multi-stakeholder forum. For each of the above listed
techniques or meeting strategies, as for every technique included on the SIS, users are provided
with a description of the technique; its strengths and weaknesses; some recommendations for its
use; and a list of readings and Internet links. The SIS also defines the various levels of
participation that might be required from different stakeholders who are involved in a research or
knowledge-sharing activity. These levels of participation are defined as:
• persuasion: using techniques to change stakeholder attitudes without raising
expectations of involvement;
• education: distributing information to promote awareness of project activities and
related issues and goals;
• survey and information extraction: seeking information to assist in the pursuit of project
activities and goals;
• consultation: promoting information flow between an organization and stakeholders
invited to express their own views and positions;
• participation for material incentives: primary stakeholders provide resources such as
labour in exchange for material incentives;
• functional participation: active engagement of primary stakeholders to meet
predetermined objectives without concrete incentives and without involvement in the
decision-making process;
• joint planning or shared decision-making: active primary stakeholder representation in
the decision-making process at all stages of project cycle and with voting and decision-
making authority;
• delegated authority: transferring responsibilities normally associated with the
organization, to primary stakeholders; and
• self-mobilisation: primary stakeholders taking and pursuing their own project initiative
with limited external intervention.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
268

In sum, the SIS provides researchers and human resource analysts with detailed
descriptions of seventy-five techniques that can be used to realise a participatory research
project.

4.0 Discussion: Potential Applications of the SIS in Human Resource Analysis

SA and the SIS have a number of potential applications in the domain of human resource
analysis. Potential applications are discussed in the following paragraphs.

4.1 Strategic Human Resources

The SIS responds directly to two of the core strategic objectives of HR 2020. First, as
noted above, the system provides users with approximately seventy-five different techniques that
can be used to achieve one of the key objectives, to “ensure that a continuous, effective internal
communications network is established… [and] continually improve the effectiveness of our
internal communication strategy to ensure that all members are aware of HR issues” (DND,
2002: 20). Second, the SIS applies directly to the strategic objective of: “maintaining an
effective consultation process that shares expertise within the department, nationally and
internationally” (DND 2002: 21).

The SIS can be put to use in human resource analysis, as a means to share information
between a variety of stakeholders and knowledge partners. The system directly addresses one of
the key difficulties faced by those engaged in this type of work: the fact that everyone can not
always be present at every meeting, all of the time. Hence the transfer and sharing of
information is of critical importance. The SIS can be maintained and updated, so that a number
of parties can remain informed of progress in any activity or area. Moreover, the SIS provides
numerous alternatives to traditional methods of information sharing. For example, for more
efficient and interesting use of meeting time, the SIS provides at least five alternatives to the
traditional meeting style. Strategic human resource analysis is knowledge-sharing, since much
of the material to be ‘analysed’ comes from a variety of disparate sources: meeting notes,
briefings, presentations, conferences, and media reports, etc. A knowledge sharing system can
therefore be a very useful tool for strategic human resource analysis.

4.2 Quality of Life Research

The Directorate of Quality of Life (DQOL) has a history of involving various stakeholder
groups into its research initiatives. Previous DQOL projects have enlisted the participation of
former CF members, ‘loved ones’ of CF members, CF service providers, and CF members in a
variety of operational theatres. SA can serve to improve researchers’ knowledge of the diversity
of stakeholders that are involved in, or impacted by QOL activities. The SIS can further assist
researchers in finding appropriate sampling techniques to obtain representation of the diversity of

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
269

stakeholder groups. Furthermore, the system can be used to decide appropriate levels of
participation for each stakeholder group, from consultation to the creation of full partnerships.
Overall, use of the SIS would provide a more holistic conception of the numerous issues that
DQOL must deal with in its research program. Specifically, for example, SA and the SIS could
be used to develop accommodation strategies that fit the diversity of CF family needs; to
improve civilian-military relations in theatres and in domestic settings; to address CF family
issues; and many other quality of life issues. Furthermore, the SIS offers project managers the
necessary techniques to address problem areas in research: to minimise the repetition of tasks, to
reduce ‘respondent burnout’.

Strategic human resource analysis and quality of life research are just two, among many
knowledge sharing and research activities in which SA and the SIS can be put to use. The
possible applications of the SIS in DND are endless.

5.0 Conclusion

The Department of National Defence, Canada, has a history of consulting with its
members and with key stakeholders in order to create policies and programs to address their
needs. SA and the SIS provide the methodological techniques required to create effective
consultation and communication strategies, and thus to enhance knowledge sharing throughout
the department. We frequently hear of ‘respondent fatigue’ and the need to be concerned with
research ethics. We also speak about the need for horizontal integration in our organisations and
try to conceive of strategies to doing away with a top-down or ‘stove-pipe’ way of doing
business. SA and the SIS allow us to achieve these ends. By using SA and the SIS and
approaching all research activities as the formation of partnerships, we learn how best to share
knowledge, thus minimising the repetition of tasks, reducing respondent fatigue and building
social and cultural capital throughout our organisations.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
270

References

Chief of the Defence Staff (CDS). (2002). Annual Report 2001-2002.


http://cds.mil.ca/pubs/anrpt2002/intro_e.asp last accessed on March 11, 2003.

Chevalier, J. (2001). Stakeholder Analysis and Natural Resource Management. Carleton


University, Ottawa. www.carleton.ca/~jchevali/stakeh2.html last accessed on February
05 2003.

Chevalier, J. and L. de la Gorgendière. (2003). The Stakeholder/Social Information System.


Carleton University, Ottawa. www.carleton.ca/~jchevali/stakeh.html last accessed on
February 05, 2003.

Christians, C. (2000). “Ethics and Politics in Qualitative Research” In Norman Denzin and
Yvonne Lincoln, eds. The Handbook of Qualitative Research. Sage, Thousand Oakes.

Department for International Development (DFID). (1995). Guidance Note on How to do


Stakeholder Analysis of Aid Programs. DFID, London. www.dfid.gov.uk/ last accessed
on March 01, 2002.

Department of National Defence (DND). (2002). Military HR Strategy 2020: Facing the
People Challenges of the Future. Ottawa, Canada.

Ford, K. (2001). Using Participatory Action Research (PAR) in Quality of Life Measurement
Among CF Personnel and Their Loved Ones. DSHRC Research Note RN 08/01.
Department of National Defence, Ottawa, Canada. http://www.dnd.ca/qol/pdf/par_e.pdf
last accessed on Feb 05, 2003.

Flemming, S. (2000). CF PERSTEMPO and Human Dimensions of Deployments Project;


Research Plan Measurement Concepts and Indicators. Department of National Defence,
PMO QOL/DSHRC, Ottawa, Canada.

Fine, M. L. Weis, S Weseen, and L. Wong (2000). “For Whom? Qualitative Research,
Representations and Social Responsibilities.” In Norman Denzin and Yvonne Lincoln,
eds. The Handbook of Qualitative Research. Sage, Thousand Oakes.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
271

DESIGNING A NEW HR SYSTEM FOR NIMA


Brian J. O’Connell, Ph.D.

Principal Research Scientist


American Institutes for Research
1000 Thomas Jefferson Street, NW
Washington, DC 20007-3835
boconnell@air.org

Jeffrey M. Beaubien, Ph.D.

Senior Research Scientist


American Institutes for Research
1000 Thomas Jefferson Street, NW
Washington, DC 20007-3835
jbeaubien@air.org

Michael J. Keeney, Ph.D.

Research Scientist
American Institutes for Research
1000 Thomas Jefferson Street, NW
Washington, DC 20007-3835
mkeeney@air.org

Thomas A. Stetz, Ph.D.

Industrial and Organizational Psychologist


Human Resources Department
U.S. National Imagery and Mapping Agency
4600 Sangamore Road - MS: D-18
Bethesda, MD 20816
stetzt@nima.mil

INTRODUCTION
The U.S. National Imagery and Mapping Agency (NIMA) was formed in 1996 by
consolidating employees from several Federal agencies. These include the Defense Mapping
Agency (DMA), the Central Imagery Office (CIO), the Defense Dissemination Program Office
(DDPO), and the National Photographic Interpretation Center (NPIC), as well as the imagery
exploitation and dissemination elements of the Defense Intelligence Agency (DIA), the National
Reconnaissance Office (NRO), the Defense Airborne Reconnaissance Office (DARO), and the
Central Intelligence Agency (CIA). NIMA’s mission is to provide geospatial intelligence and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
272

related services to policy makers, military commanders, and civilian agencies in support of all
national security initiatives (for additional information, see www.nima.mil).
From the outset, NIMA’s management faced a complex, high-pressure situation that
involved a critical – but predictable – set of issues. These included the need to develop a shared
vision, the need to integrate work processes, and the need to meld multiple cultures. An
additional challenge was the significant societal pressure to continuously improve efficiency and
quality (GAO/Comptroller General of the United States, 1996). Because each legacy
organization had their own unique human resources (HR) system, the newly-formed NIMA
workforce was organized into approximately 600 unique position titles. To make effective
personnel decisions, NIMA would first have to describe its work requirements. Unfortunately,
traditional job descriptions – which list the primary duties and tasks to be performed within the
position – are costly to develop, often lack the required precision, and have been criticized as
static snapshots of dynamic jobs (Cascio, 1995). Therefore, NIMA’s management decided to
forgo traditional job descriptions in favor of dynamic work roles.
Work roles are distinct from job descriptions in that they define not only the job tasks but
also describe the competencies that are required to perform those tasks (Mulqueen, Stetz,
Beaubien, & O’Connell, 2003). Each work role includes a title, a brief description of the work, a
list of core job-related competencies, a list of relevant license and education requirements, and a
description of the typical physical and environmental demands. In essence, work roles define
different kinds of work – such as secretary or security guard – that require unique competency
sets. By extension, work roles also define sets of employees who are essentially interchangeable.
Any employee in a given a work role should be able to perform the duties of any other employee
in that same work role with at least minimal proficiency within 90 days (Mulqueen, et. al., 2003).
Recognizing the need to strategically manage their human capital and to promote a single
organizational identity, NIMA management decided to create a new, integrated HR management
system that was based on the work roles concept. The new HR system was designed to be
strategically oriented, person-based, and broad-banded to leverage the flexibility that was
provided by NIMA’s exemption from many civilian personnel regulations under the DoD
Authorization Act of 1996. Further, basing the new HR system on skills rather than tasks would
provide the capability to support all HR initiatives, such as recruitment, selection, manpower
planning, compensation, promotion, training, and individual career path planning.
To achieve this goal, NIMA contracted with the American Institutes for Research (AIR)
to develop a competency-based HR management system. This system – which was based on the
O*Net Occupational Information Network (Peterson, Mumford, Borman, Jeanneret, &
Fleishman, 1999) – will eventually serve as the basis for all HR functions at NIMA. The O*Net
model evolved from a thorough review of previous work-related taxonomies, and represents
state-of-the-art thinking about the world of work. Unlike many other models, it conceptualizes
both the general and the specific aspects of the work domain in an integrated fashion and was
thus ideally suited to NIMA’s needs. For example, O*Net’s Basic and Cross-Functional Skill
(BCFS) taxonomy was used to capture broad competencies that cut across NIMA jobs, so that
these jobs could be compared and grouped to create a skills-based occupational structure.
Similarly, O*Net’s Generalized Work Activities (GWA) taxonomy was used to ensure

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
273

homogeneity of work activities within each occupation. Finally, O*Net’s Occupational Skills
taxonomy was used to characterize the specific requirements of particular NIMA occupations.
METHODOLOGY
NIMA’s new personnel system was developed during a four-step process. The first step
involved grouping legacy jobs into a parsimonious occupational structure and collecting data to
statistically examine the quality and comprehensiveness of these groupings. The second step
involved developing work roles within each occupation, again using empirical data to examine
quality and comprehensiveness. The third step involved populating the newly-developed skills
database with employee competency data. The fourth and final step involved periodically
reviewing and updating the work roles to account for new developments in the ever-changing
world of geospatial intelligence analysis.
Grouping Legacy Jobs into Occupations
Our initial goal was to create 20 to 30 broad occupations of employees who used similar
skills in performing similar activities. We first assembled a comprehensive list of the Agency’s
legacy position descriptions. A panel of Industrial and Organizational Psychologists then
grouped the position descriptions into a smaller, yet comprehensive list of composite titles. We
then created a description for each composite which summarized the aggregate duties and
principal differences among the positions within the composite. Subject Matter Experts (SMEs)
then reviewed the descriptions and eliminated or combined composites to reflect near-term
outsourcing or other changes that were expected to occur within NIMA. The result was a set of
approximately 125 composite descriptions that summarized all of the work performed at NIMA.
We used statistical data to guide our decisions regarding the most appropriate
occupational structure. To statistically group the composite positions into a smaller number of
relatively broad occupations, it was necessary to create profiles on a common set of descriptors.
We collected SME ratings using O*Net BCFSs and cluster analyzed the profile ratings.
Hierarchical agglomerative procedures were used to iteratively build increasingly larger clusters
until all of the composites had been clustered together into a single group (Everitt, 1993;
Kaufman & Rousseeuw, 1990). At the end, a dendogram graphically displayed the order in
which the composites were combined and indicated the magnitude of the differences between the
composites being combined at each step. We reviewed each dendogram, along with the matrix of
distances between composites. Based on this information, we identified approximately 30
clusters – or potential occupations – that appeared to be both statistically and practically viable.
We then presented the results to NIMA managers and other critical stakeholders to elicit their
reactions, concerns, and approval. The results were considered along with a host of practical and
political organizational factors to arrive at a final structure of about 25 broad occupations.
Defining Work Roles within Occupations
We began to define work roles within each occupation by assembling panels of SMEs
from each occupation to offer guidance regarding the preliminary work roles. These panels
identified meaningful distinctions among the jobs within their occupation, and developed
preliminary titles and general descriptions for each work role. At the conclusion of the SME
panels, each occupation had a defined set of work roles, with a total of about 200 preliminary

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
274

work roles throughout the Agency. SMEs also identified a sample of up to 12 prototypical
employee representatives for each work role. These representatives were chosen because their
current work duties indicated that they were working in one of the approximately 200
preliminary work roles. Each work role representative completed a competency profile of the
skills, knowledge, and tools that he or she currently used to perform their work.
After collecting the employee competency profiles, we used the Jaccard similarity
coefficient to assess the degree to which pairs of work roles had the same competencies
identified as being essential for performing work they describe. The Jaccard coefficient ranges
from 0.0 (indicating no overlap) to 1.0 (indicating complete overlap). The Jaccard coefficient is
based upon binary yes or no coding, which in this case indicated whether an employee used each
competency in their work role. This statistic has a significant advantage in reflecting only the
presence of the characteristics in either one or both work roles (Kaufman & Rousseeuw, 1990); it
does not reflect the mutual absence of a competency from both work roles in a pair. We used the
representatives’ competency data to create a profile for each work role. The profile listed each
knowledge, skill, and tool that was identified by representatives as currently used in the work
role, as well as the number of representatives that reported using each competency. We then
assessed the degree of overlap within each work role and across work roles within each
occupation (Mulqueen, et. al., 2003). Each occupation required three Jaccard analyses. The first
computed the degree of similarity among representatives within each work role. This produced a
measure of agreement among representatives regarding their individual work role requirements.
The second Jaccard analysis computed the degree of similarity between each individual work
role and the pool of all other work roles within the occupation. The third and final Jaccard
analysis computed the degree of competency similarity between each pair of work roles within
the occupation.
Next, we organized a second set of SME panels to evaluate the work role competency
profiles and to create the final set of work roles. The SMEs reviewed the Jaccard similarity
matrices to determine whether there was unusual redundancy of competency requirements
among work roles. For these determinations, we found the pairwise similarity matrix to be more
helpful than the pooled matrix. A high degree of overlap among 2 or more work roles might
indicate that the roles were too similar to be considered separate, suggesting that they should be
combined into a single work role. We established a criterion of 40% or greater similarity for the
SMEs to discuss the affected work roles and their requirements. Factors other than competency
similarity – such as critical mission function and staffing requirements – were also used to
determine whether work roles should be combined or remain separate. The SMEs made their
final determinations based on their knowledge of the role requirements and any other reasons for
keeping similar work roles separate. During the SME panels, the number of work roles was
reduced from around 200 to under 170.
In addition to the Jaccard similarity indices, each work role profile contained the list of
associated competencies, organized according to the number of representatives who indicated
that they currently use each. If a majority of representatives used a competency, it was identified
as a core competency. The SMEs reached consensus on final lists of competencies for each work
role by noting which competencies were core competencies and if any competencies were
redundant or missing. Once again, this was a matter of expert judgment that was guided by

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
275

empirical information from the work role analyses. As a final step in work role refinement, the
SMEs reviewed each work role description. At this time, they indicated whether any specific
education or licensure requirements were necessary for the work role, and if there were any
special environmental or physical requirements for performing the work.
Populating the Skills Database with Employee Competency Data
Next, NIMA employees assessed their proficiency levels on the work role competencies
(i.e., knowledges, skills, and tools) that were identified during the previous phase, and entered
this information into a database using the Skill Inventory Library (SKIL). SKIL is a Microsoft
Access database that was equipped with a special user interface for gathering proficiency ratings.
The software – which was designed to require minimal expertise with computers – walked
employees through the process of entering proficiency and current use information about each
skill, tool, and knowledge in the database with which that employee has expertise.
The cross-occupational skills were addressed first; the employees entered proficiency
data for whichever ones he or she considered applicable. Next, the employee provided
proficiency data for the skills, tools, and knowledges for their current occupation, along with
those from related occupations, as appropriate. At any stage in the process, employees were
allowed to view the proficiencies that were already entered, and change their proficiency ratings
as necessary. The SKIL software was available in a centrally located computer lab, and at
computer terminals within the employees’ office spaces.
Periodic Review and Update
NIMA’s initial set of work roles became operational during 1998. However, work roles
require periodic review to keep them current. This periodic review, which began during Fall
2002, had six primary goals. The first goal was to add, delete, and merge work roles as needed.
The second goal was to replace obsolete competencies. The third goal was to ensure Agency-
wide consistency in how work is described. The fourth goal was to reduce redundancy at both
the work role and competency level. The fifth goal was to systematically collect importance
ratings for each competency. The sixth and final goal was to update all the work role and
competency changes into NIMA’s new PeopleSoft database.
Panels of SMEs first reviewed each work role. The panels analyzed the existing work
role descriptions in their occupation to determine if they adequately describe the current work
performed. For each work role, they reviewed, revised, added, or replaced any obsolete or
missing competencies and educational, physical, and environmental requirements. HR
representatives reviewed all proposed changes to verify that they conformed to relevant legal
requirements. Finally, the panels identified a “short list” of no more than 60 competencies –
including no more than 20 occupation-specific skills, 10 cross-occupational skills, 20
knowledges, and 10 tools – for each work role. This short list was needed because the SKIL
database had become populated with redundant skills (such as “Human Resources Mentoring,”
“Imagery Analysis Mentoring,” and “Geospatial Analysis Mentoring”).
Each work role’s short list of competencies was converted into a survey and deployed to
a sample of up to 20 of the employees in the work role plus the Professional Advisory Board
(PAB) associated with each work role. When a work role encompassed fewer than 20 employees,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
276

we surveyed all of the employees. Whenever the SME panels created a new work role or merged
existing work roles, they were required to identify a suitable sample of employees to complete
the survey. Each survey gathered respondent background information and ratings of the core
competencies. For each competency, a dichotomous (yes/no) format item asked whether the
competency is used. When the respondent indicated “yes,” the survey asked three additional
questions: the competency’s importance to performing the work, the extent to which it is needed
upon hire, and the amount of time that the respondent spends using it. The importance, hire, and
time items used 5-point Likert scales, with anchors ranging from “Strongly Disagree” (1) to
“Strongly Agree” (5). Because the survey was electronically-deployed, validation rules
prevented out-of-range values.
Next, we weighted the response data at two steps. The first weighting was made at the
work role level. The purpose of this first weighting was to adjust the PAB ratings to contribute
50 percent of the final weighted response. For each work role, we identified the overall
importance for each competency by calculating the weighted mean and standard deviation of
ratings for need at hire, frequency, and importance across all the incumbent raters for that work
role and its PAB (with the PAB ratings weighted as previously discussed). After calculating the
weighted overall importance ratings, we sorted the competencies within competency type
(knowledge, skill, cross-occupational skill, and tool) in decreasing order of importance (mean)
and increasing consensus about importance (standard deviation). The second weighting, which
was made at the occupation level, was designed to provide greater weight to competencies that
are identified in a large number of roles. Therefore, we weighted each competency by the
number of work roles in which it appeared. For example, if a competency appeared in 5 work
roles, it was weighted by 5; if a competency appeared in only 1 work role, it was not weighted.
This weighting procedure was done to ensure that the most “critical” competencies were in fact
used by a large percentage of the NIMA workforce.
We then used Chronbach’s alpha to calculate the overall degree of inter-rater reliability
(or consensus) within each work role. Cronbach’s alpha represents the degree to which raters
provided consistent evaluations of the importance of each competency. Cronbach’s alpha is
based on both the number of raters, as well as the similarity of the ratings included in calculating
it. Consequently, low values may also be due either to small sample size alone, low consensus
alone, or both low consensus and small sample size. Values for Cronbach’s alpha range from 0.0
(indicating no consistency) to 1.0 (indicating perfect agreement). Values under 0.50 may suggest
that the raters disagreed about the importance of the competencies they rated, and possibly that
the work role may require additional review. However, low values can also result when the work
role had only a small number of reviewers. Conversely, high values indicate agreement among
the reviewers about the importance of the competencies they rated, suggesting that the work role
more accurately describes the work.
Finally, we used the Jaccard similarity coefficient to indicate the degree to which pairs of
work roles had the same competencies identified as being essential for performing work they
describe. Specifically, we considered values for the Jaccard coefficient greater than 0.80 as
suggesting that further review should be considered. As before, statistical results were combined
with expert judgments to determine whether or not individual pairs of work roles should be
combined. Before the results data could be integrated into PeopleSoft, they needed to be

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
277

reviewed by NIMA management and other key stakeholders for their review and sign-off. We
prepared four main types of documentation for the Agency-Level review. These included tables
that displayed the return rate for each work role survey; changes (if any) in the number of work
roles before and after the periodic review; occupation-specific tables that displayed the inter-rater
reliabilities and pairwise similarities for all work roles within that occupation; and unique tables
that show the mean ratings for each competency by work role and occupation.
CONCLUSION
As a result of this occupational development effort, NIMA now has: 1) an empirically-
derived structure of broad occupations, each consisting of work roles that apply similar skills
toward the performance of similar work activities; 2) a comprehensive database of the
competencies that are used to perform the full range of work at the Agency, and; 3) a database of
employee self-rated proficiencies for over 90% of the NIMA workforce. Competencies have
now become a major part of NIMA’s career development and promotion processes, and have
influenced the development of Agency training programs. Moreover, during recent international
crises, the competency data have been used to “search for the expert” to perform specific, quick-
turnaround geospatial intelligence analysis missions. The possible uses of the data are numerous,
and the work roles will continue to form the basis for other HR initiatives, such as recruitment,
selection, and manpower planning.
REFERENCES
Cascio, W. F. (1995). Whither industrial and organizational psychology in a changing world of
work? American Psychologist, 50, 928-939.
Everitt, B. (1993). Cluster analysis (3rd ed.). New York: Halsted.
General Accounting Office (GAO)/Comptroller General of the United States (1996). Effectively
implementing the Government Performance and Results Act (GAO/GGD-96-118).
Washington DC: Author.
Kaufman, L. & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster
analysis. New York: Wiley-Interscience.
Mulqueen, C. M., Stetz, T. A., Beaubien, J. M. & O’Connell, B. J. (2003, April). Developing
dynamic work roles using Jaccard similarity indices of employee competency data. Paper
presented at the 18th Annual Conference of the Society for Industrial and Organizational
Psychology, Orlando, FL.
Peterson, N. G., Mumford, M. D., Borman, W. C., Jeanneret, P. R., & Fleishman, E. A. (1999).
An occupational information system for the 21st century: The development of O*Net.
Washington DC: American Psychological Association.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
278

Personnel Security Investigations: Improving the Quality of


Subject and Workplace Interviews
Jeremy F. Peck
Northrop Grumman Mission Systems /
Defense Personnel Security Research Center
Introduction
Keeping our nation’s security secrets is a great challenge. Each year thousands of
individuals are evaluated for initial or continuing access to sensitive or classified
information. For Top Secret access eligibility, the principal method for initially screening
and periodically reevaluating individuals is the Single Scope Background Investigation
(SSBI) and SSBI Periodic Reinvestigation (SSBI-PR). These personnel security
investigations (PSIs) gather information about an individual’s background using a variety
of sources. These sources include interviews (e.g., subject, supervisor, coworker,
reference, neighbor, ex-spouse, medical), self-report background questionnaires, record
checks (e.g., national agency checks, local agency checks, public records checks, credit
checks), and in some agencies, polygraphs. Of these sources, subject interviews are
among the most productive in terms of providing information relevant to the clearance
adjudication process. Workplace interviews, on the other hand, are less productive than
subject interviews in gathering information of security relevance. Possible reasons for
this are: co-workers and supervisors are not aware of the information; are concerned
about defying social norms of disclosing negative information about their coworkers or
subordinates; or they may fear legal recourse or retaliation. Both interviews are among
the most expensive components of the PSI process.
Determining what makes one interview method superior to another is vital for
meaningful monitoring of investigative effectiveness as well as for establishing system
improvements. The underlying challenge, therefore, is defining, measuring, and
improving interviewing quality and effectiveness.
Purpose and Overview
Empirical evidence suggests that structured interviews provide more valid
information than unstructured interviews. This study was conducted to examine ways to
improve interviewing practices of the subject and workplace references using a set of
structured questions enhanced to cover each area of security concern outlined in the
Adjudicative Guidelines.1
The objectives of, and the specific methods used by investigators in conducting
interviews are integral to the overall quality of personnel security investigations. Despite
2
the subject interview doing a good job in obtaining issue-relevant information, there are
important procedural and methodological characteristics which require attention in
conducting both subject and workplace interviews.
1
Guidelines used by defense and intelligence community adjudicative agencies in making
determinations whether to grant a security clearance.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
279

Observations made as a result of a review of 1,200 SSBIs conducted by the


Defense Security Service (DSS) provides anecdotal evidence which suggests that some of
the problematic procedural components include:
• A lack of coordinating the interviews with other investigative components. (e.g.,
issue-relevant information that is developed through other sources may not be
followed-up with the subject)
• Investigators tend not to acquire the names of additional references from the
subject or workplace references during the interview which, if done, could be
passed on to other investigators conducting reference interviews
• Investigators with less experience may not know why they are asking certain
questions
Interview Objectives
The DSS investigative manual asserts that the subject interview has two basic
objectives:
1. To ensure that all information listed by the subject on his or her security forms
is accurate and complete.
2. To resolve issue or discrepant information by offering the subject an
opportunity to explain, refute, amplify, or mitigate information developed by
an investigation or known at the onset of the investigation.
The first objective consists of reviewing and validating the responses the subject
provides to each item on the security form (SF 86, SF 85P, SF 85PS, EPSQ, etc.). While
it is important to ensure the information on the forms is accurate and complete, questions
remain as to the costs and the benefits of the form validation portion of the interview.
One question is whether the time spent on validating forms could be better used on
uncovering issue-relevant information related to substantive security concerns.
Investigators who conduct the SSBI and the SSBI-PR are tasked with gathering as
much information about the subject as possible under time and resource constraints.
Validating the subject’s security form can be done relatively quickly compared to
conducting a more thorough interview which tends to take longer. Such time pressure
discourages investigators and review personnel (supervisors, case analysts, and
adjudicators) from being thorough. The emphasis placed on high production (e.g.,
conducting as many investigations as possible under severe time constraints) creates the
risk of compromising quality. Such quality concerns have become particularly important
because of the current trend towards outsourcing PSIs and efforts in making clearance
eligibility determinations reciprocal across federal agencies. Private contractors
increasingly conduct personnel security investigations for the government, yet there is no
standard method of conducting interviews or of assessing their quality.
2
Information relevant to establishing that an issue is of potential current security concern
and/or information that an adjudicator would want to review in making a clearance
decision.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
280

Anecdotal evidence suggests more could be done to improve the quality of


interviews in regards to the amount of information adjudicators require in making well-
informed clearance determinations. Existing evidence suggests that much of the pertinent
information that is either volunteered by the subject or is provided in response to specific
questions is not followed-up with probing questions. Without the appropriate level of
follow-up questions, the Report of Investigations (ROI) completed by investigators and
forwarded to adjudicators provides less information on which adjudicators can base their
clearance decisions. Interviews conducted in a standardized manner that involve asking
specific follow-up questions of security relevance as well as questions that mitigate3
security concerns can go a long way in assisting adjudicators in making clearance
decisions by providing them with additional and necessary information. Without this
information, adjudicators often require investigators to conduct a second interview to
obtain the missing information before making a clearance determination. However,
because of time pressures to close cases (make a clearance determination), adjudicators
might be reluctant to send investigative cases back into the field for follow-up interviews.
Method
The development of the enhanced interview questions consisted of the following
four steps.
1. Reviewing the available research literature on the interviewing methods used
across industries. Key points of this research include: Longer questions tend to
produce longer responses and more reporting of undesirable behaviors; use of
probe questions that obtain additional information are effective in motivating
respondents to reveal personal information; structured interviews are the more
valid than unstructured interviews.
2. Reviewing the investigative manuals of several federal agencies that are based
on Executive Order (EO) 12968, Access to Classified Information (1995)
which sets the standard for eligibility to classified information. Each agency
applies EO 12968 standards and guidelines to their respective investigative
manuals differently based on the particular mission of the agency. This review
was conducted in order to determine: a) what, if any, recommendations exist
on what questions should be asked when interviewing the subject and
workplace sources and b) to provide the basis for making the enhanced
interview questions applicable to each agency.
3
Information that explains, refutes, moderates or lessens issue-relevant information.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
281

3. Generating the specific questions. For this, the Adjudicative Guidelines were
relied upon to ensure the questions coincided with these guidelines. These
guidelines include the conditions that could raise a security concern which may
be disqualifying factors in obtaining a clearance. They also include the
conditions that could mitigate such security concerns.
4. Obtaining input from individuals with expertise in conducting background
investigations for several federal agencies. These experts reviewed drafts of the
enhanced interview questions and provided specific feedback which was
compiled and integrated into the final enhanced interview questions document.
Once a standard set of questions was developed, a pilot test was initiated to test
the value of the enhanced questioning protocol. The test sample consisted of
approximately 150 SSBI-PR subject and workplace interviews. Each interview was
conducted using the enhanced interview protocol. In addition, approximately 150
baseline interviews were conducted using the traditional line of questioning. The
structured format of the enhanced interview questions provides investigators with specific
question ordering, wording, when to use close versus open-ended questions, and what
specific follow-up questions to ask based on their responses to the initial question.
Therefore, the structured format provides greater coverage of information that may be of
potential security relevance. Limiting the “holes” in coverage will prevent the adjudicator
from having to fill in those holes by either having to speculate as to what information
should have been obtained or by having to request a follow-up interview to obtain
missing information.
One federal government agency was chosen to participate. Investigators from this
agency were trained on how to use the protocol. An Interview Preparation Guide was
developed as well and given to investigators to use prior to and in conjunction with
conducting the interviews. During the training, emphasis was placed on having
investigators ask the appropriate follow-up questions listed on the document. These
follow-up questions are an integral part of the enhancements made to the line of
questioning currently used. For example, on a question related to a subject’s financial
situation the investigator currently might ask questions associated only with validating
what has been provided on the subject’s security form. An enhanced question on the
same topic asks: “Do you have any difficulties paying your bills or concerns about your
current level of debt? If the subject answers “yes” the investigator is instructed to get
details and the circumstances surrounding this issue.
Results
Rating forms were provided to investigators of the federal agency used in the pilot
test to rate the quality of the enhanced interviews compared to the traditional line of
questioning used. Investigators were instructed to indicate on the form the extent to
which they agreed or disagreed with several statements. An example of an item on the
investigators’ rating form is “The enhanced interview provided more complete coverage
of disqualifying factors for this case.” Different rating forms were provided to
adjudicators at this agency to rate the extent to which the Report of Investigation for each
of the enhanced interviews contained unresolved issues or inadequate information for
each of the required elements of the Investigative Standards (required by EO 12968).
Preliminary data analysis suggests that while the enhanced subject interviews on
average are taking longer to complete, the structured format provides more coverage
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
282

whether or not such coverage surfaces information of a security concern. For this study,
the rating forms are expected to provide the bulk of the data needed for a thorough
analysis as to the overall effectiveness of the enhanced interview questions. However, at
the time of this writing, the sample cases have yet to be adjudicated. Once the rating
forms have been completed, comparisons will be made between the adjudicator rating
forms for the baseline sample and the adjudicator rating forms for the enhanced interview
sample. The rating forms completed by the investigators will also be analyzed to
determine how well the enhanced interview protocol performed in comparison to the
traditional approach to conducting interviews.
Conclusion
There is an inherent tension in the PSI program between needing to validate
information provided on the subject’s security form; a non-confrontational approach,
versus an interview approach which involves probing deeper into areas in the subject’s
life that may be of security concern; an investigative approach. This study intends to
reconcile some of that tension as well as the tension between conducting effective,
thorough interviews and those that are completed and adjudicated in a timely manner.
A number of important findings are expected to emerge from this study. In
determining what makes one interview method superior to another the rating data will
provide meaningful information on improving interview and overall investigative
effectiveness. Because the relative productivity of the subject and workplace interviews is
similar across federal agencies the findings from this study could have implications for
improving interviewing techniques across all the agencies that conduct security
background investigations.
References
Bosshardt, M.J., & Lang, E.L. (2002, November). Improving the subject and employment
interview processes: A review of research and practice. Minneapolis, MN: Personnel
Decisions Research Institutes, Inc.
Carney, R.M. (1996, March). SSBI source yield: An examination of sources contacted
during the SSBI. (PERS-TR-96-001). Monterey, CA: Defense Personnel Security
Research Center.
Defense Security Service. (2001, September 10). DSS Investigations Manual.
Director of Central Intelligence. (1998). Adjudicative Guidelines for Determining
Eligibility to Access to Classified Information (DCID 6/4, Annex C, Jul. 2, 1998).
Washington, D.C.: Author.
Executive Order 12968, “Access to Classified Information,” August 2, 1995.
Kramer, L.A., Crawford, L.S., Heuer, Jr., R.J. & Hagen, R.R. (2001, August). SSBI-PR
source yield: An examination of sources contact during the SSBI-PR. (PERS-TR-01-6).
Monterey CA: Defense Personnel Security Research Center.
Defense Personnel Security Research Center. (2003). Unpublished analyses. Monterey,
CA: Author
Privacy Act, 5 U.S.C. 552a (1974).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
283

Strategies for Increased Reporting of Security-Relevant Behavior*


Kent S. Crawford
Defense Personnel Security Research Center
Suzanne Wood
Consultant to Northrop Grumman Mission Systems
Introduction
United States federal policies and those of the Department of Defense (DoD) are
designed to ensure that the cleared workforce is reliable, trustworthy, and loyal. One of
the requirements of such policies is that supervisors and coworkers who work in
classified environments report to security managers any behavior they observe among
workplace colleagues that may be of security concern. In essence, supervisors and
coworkers, working closely with each other as they do, can observe or become aware of
behaviors that might suggest a risk to national security.
These policy requirements are in place as one means to prevent Americans from
committing espionage. However, espionage is a rare event. One is not likely ever to
encounter a spy, much less observe an act of espionage. So while catching spies may be
an ultimate goal for these policies and the reason to report counterintelligence- (CI) and
security-related behavior, the policies are also designed to regularly evaluate the millions
of ordinary people in the workplace who have access to classified information, people
who have no intention of committing espionage but are likely—over a period of time or
in changing contexts—to develop personal problems that may possibly question their
reliability and trustworthiness. The philosophy behind policies nowadays is that the
government is not just trying to root out spies: by asking employees to report, the
government is not only identifying potential security risks but actually helping employees
take care of the kinds of human problems that plague us all from time to time. However,
reporting policy—based on the adjudicative guidelines—has mixed together the kinds of
behavior that should always be reported. CI- and security-related behaviors are mixed
with reliability and suitability problems. This has led to confusion among supervisors and
coworkers about what behaviors are the most important to report. It is this confusion that
often paralyses employees: if they are not sure exactly what to report, they simply report
nothing.
Evidence gathered during a recent PERSEREC study (Wood & Marshall-Mies,
2003) shows that the reporting rate is in fact very low. Supervisors and coworkers are
reluctant to inform security managers about many behaviors that they observe in the
workplace because they believe them to be too personal to report. It is ironic that the very
behaviors that the government wants people to report—in order to be able to help them—
are the very ones that supervisors and coworkers are loath to share with authorities.
Employees are, however, more willing to report behaviors that are egregious and appear
to have a more direct some connection with national security.
_______________________________________________________________________
*The views expressed in this paper are those of the authors and do not necessarily reflect
those of the United States Department of Defense.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
284

The present paper, based on the 2003 PERSEREC report mentioned above,
examines current reporting policies, discusses research on reporting, and describes
supervisor and employee confusion about what to report. It recommends ways to reduce
the disconnect between the requirements to report and the actual reporting of any DoD
security-relevant behavior. The aim of this paper is to recommend changes in
organizational practice that might lead to the establishment of conditions under which
employees would be more likely to report egregious security-relevant behaviors.
Methodology
The research methodology consisted of four steps: (1) reviewing policies related
to supervisor and coworker reporting; (2) conducting literature reviews of commission
studies and other research to learn about the willingness of people in general to report on
colleagues; (3) interviewing military service, DoD, and non-DoD security and other
management personnel to determine the frequency of reporting and to gather
recommendations for improving reporting policy and its implementation; and (4)
conducting focus groups with supervisors and coworkers in the field to discuss their
reporting responsibilities, willingness to report, and recommendations.
Results
Review of Policies
The key policy documents that concern the reporting of CI- and security-related
behaviors were compared and contrasted, exploring areas of overlap, degree of
specificity, and whether one policy superseded another.
Executive Order. In August 1995, Executive Order 12968, Access to Classified
Information, addressed the subject of employee reporting responsibilities. The order
states, inter alia, that employees are expected to “report any information that raises
doubts as to whether another employee’s continued eligibility for access to classified
information is clearly consistent with the national security.” The order also expands
recommended prevention, treatment, and rehabilitation programs beyond drug and
alcohol abuse and emphasizes retaining personnel while they deal with a wide range of
problems through counseling, or the development of appropriate life-skills.
Directive for Sensitive Compartmented Information (SCI). Director of Central
Intelligence Directive (DCID) 6/4, Personnel Security Standards and Procedures
Governing Eligibility for Access to Sensitive Compartmented Information (July 2, 1998),
covering individuals with SCI access, requires that security awareness programs be
established for supervisors that provide practical guidance on indicators that may signal
matters of security concern. DCID 6/4 discusses individuals’ responsibilities for reporting
activities by anyone, including their coworkers, that could conflict with those individuals’
ability to protect highly classified information.
DoD Directives and Regulations. DoD Directive 5200.2-R, Personnel Security
Program (January, 1987, amended 1996 and soon to be completely revised), implements
the personnel security requirements of various executive orders. The directive outlines
personnel security policies and procedures, including categories of behavior to be
reported and provisions for helping troubled employees. The categories of behavior
which serve as adjudicative guidelines and are to be reported are allegiance to the U.S.;
foreign influence; foreign preference; sexual behavior; personal conduct; financial
considerations; alcohol consumption; drug involvement; emotional, mental and
personality disorders; criminal conduct; security violations; outside activities; and misuse

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
285

of information technology systems. While the directive requires that infractions of all the
above categories of behavior be reported, like the other formal documents described
above, it is vague on definitions of these behaviors and it mixes together CI, security, and
personal problems in the one list. It requires that supervisors and coworkers report all
relevant personnel security behaviors.
Literature Review
Sarbin (2001), in exploring the psychological literature, suggested that lack of
reporting in the workplace is due to cultural prohibitions against informing on one’s
colleagues and friends, especially for behaviors that are not strictly violations of security
rules. Except in cases where the behavior is egregious, Sarbin questioned the
effectiveness of current DoD policy that requires employees to inform on their fellow
workers. Reviewing proxy measures of reporting in different fields such as whistle-
blowing, Giacalone (2001) also found that supervisors and coworkers in the general
workplace report only a small percentage of the questionable behaviors they observe. In
spite of the low rate of reporting and the cultural injunction against informing on others,
Giacalone recommended several interventions to help increase the rate of reporting.
These interventions were designed to make policies clearer and more transparent and to
train supervisors and workers on these policies, the behaviors of concern, and the nexus
between these behaviors and national security.
A review of commission studies (Joint Security Commission, 1994; Joint Security
Commission II, 1999) and related research (Bosshardt, DuBois, & Crawford, 1991;
Kramer, Crawford, Heuer, and Hagen, 2001 Fischer & Morgan, 2002; Wood, 2001;
Erdreich, Parks, & Amador, 1993) confirmed Sarbin and Giacalone’s findings that few
individuals report security-related issues. Supervisors provide more security-relevant
information than do coworkers, but neither is a very productive source. For example, in a
PERSEREC study of four federal agencies, supervisors and coworkers provided very
little information compared to sources such as subject interviews, personal history
questionnaires (SF-86/SPHS), and credit reports.

Percentage of SSBI-PRs Where


Source Yielded Issue Information
Source
DoD OPM CIA NRO
(n = 1,611) (n = 1,332) (n = 855) (n = 923)
Subject Interview 15% 18% 23% 25%
SF-86/SPHS 11% 15% 5% 10%
Credit Reports 11% 18% 10% 9%
Supervisor Interviews 3% 5% 5% 2%
Coworker Interviews 1% 3% 3% 1%

Interviews with Security Managers


For the 2003 PERSEREC study, interviews were conducted with 45 security
managers and management personnel at 20 federal agencies, including intelligence
agencies, military organizations, Department of Energy, State Department, Federal
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
286

Bureau of Investigation and others. These personnel supported the notion that supervisors
report more often than coworkers, but neither set of people reports much. They offered a
series of reasons why this should be so, and made some recommendations to improve the
situation. Security managers suggested the following reasons why people may not report:
Cultural resistance.
Negative perceptions of reporting and its consequences (to the reporter and to
the person reported).
Lack of knowledge and experience of security officers, supervisors, and the
workforce with reporting requirements.
Unclear relationships between security, employee assistance programs, and
other functions in the organization.
Focus Groups
Supervisors and coworkers in focus groups supported the managers’ estimates on
the frequency of supervisor and coworker reports. They noted their own reluctance to
report on their colleagues. Reasons for not reporting included:
Cannot see the nexus between certain reportable behaviors and national
security.
Fear they will cause people problems.
Fear that reported colleagues will be harmed because the system may not be
fair to them.
Fear that they will lose control once the report has been made to Security.
Fear of negative repercussions to themselves for reporting.
However, they are not resistant to reporting serious infractions. It is simply that
the DoD Directive 5200.2-R reporting requirements are perceived as being too broad and
amorphous and, thus, very difficult to implement. The regulation requires that supervisors
be trained in recognizing “indicators that may signal matters of personnel security
concern” and that supervisors and coworkers report “information with potentially serious
security significance.” While these phrases may have been clear to the original framers of
the directive, they are far from obvious to personnel in the field. Noted a supervisor, “We
need clearer rules about what should be reported up the chain.” However, even in the
absence of such guidance, supervisors and coworkers do intuitively distinguish between
behaviors that are directly related to national security (which they say they have no
problem reporting) and behaviors that are associated with reliability and suitability for
employment (which they are hesitant to report).
The single most important reason employees gave for seldom reporting is that
they personally cannot see the precise connection—the nexus—between certain
behaviors and national security. They said that they do not know where to draw the line
between egregious security-related behaviors and gray-area suitability or personal
behaviors—the kinds of problems that, while important, are seen as less critical in terms
of security risk management and are not directly linked in people’s minds with the
compromise of security or with espionage.
If the connection were made apparent, they said they would be more motivated to
report in order to protect their country and national security. In response to this concern,
PERSEREC subsequently developed a brochure that lists behaviors that always should be
reported if they become known. Reporting these behaviors requires no judgment calls.
The brochure is called Counterintelligence Reporting Essentials (CORE): A Practical

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
287

Guide for Reporting Counterintelligence and Security Indicators and will be distributed
to all CI and other security organizations for use in the field as part of security awareness
presentations and CI briefings.
Conclusions
Findings from the PERSEREC study show that there will always be some tension
between the rules requiring reporting and our cultural values not to inform on colleagues.
This is especially likely in cases where the “infraction” is not perceived to be an illegal
activity or security violation but a common, and often transient, personal problem. Yet,
provided they understand the nexus, study participants had no objection to reporting
serious security-related behaviors so long as it is made clear what constitutes such
behaviors. They believe that temporary personal problems may be better handled in a
different manner, perhaps by the supervisor through referral to employee assistance
programs or to other kinds of monitored treatment programs.
The PERSEREC study points to the need to increase the reporting of critical and
obvious security-related behaviors, which employees say they are willing to report. It
suggests drawing a clearer distinction between the reporting, and consequences, of
egregious security-related behaviors and suitability-type behaviors of a more personal
nature that realistically are not likely to be reported. By clearly communicating this
distinction to supervisors and coworkers, through use of PERSEREC’s CORE brochure,
and by encouraging supervisors to become more proactive in addressing suitability
issues, the rate of reporting of truly serious security infractions may well be increased.
References
Bosshardt, M.J., DuBois, D.A., & Crawford, K.S. (1991). Continuing assessment of
cleared personnel in the military services, Reports 1-4 (PERS-TR-91-1 through 4).
Monterey, CA: Defense Personnel Security Research Center.
Erdreich, B.L., Parks, J.L., & Amador, A.C. (1993). Whistleblowing in the government:
An update. Washington, DC: U.S. Merit Systems Protection Board.
Fischer, L.F. & Morgan, R.W. (2002). Sources of information and issues leading to
clearance revocations. Monterey, CA: Defense Personnel Security Research Center.
Giacalone, R.A. (April, 2001). Coworker and supervisor disclosure of reportable
behavior: A review of proxy literature and programs. Paper presented at a colloquium
on Obtaining Information from the Workplace: Supervisor and Coworker Reporting.
Monterey, CA: Defense Personnel Security Research Center.
Joint Security Commission (1994). Redefining security: A report to the Secretary of
Defense and the Director of Central Intelligence. Washington, DC: Author.
Joint Security Commission (1999). A report by the Joint Security Commission II.
Washington, DC: Author.
Kramer, L.A., Crawford, K.S., Heuer, R.J., & Hagen, R.R. (2001). Single-Scope
Background Investigation-Periodic Reinvestigation (SSBI-PR) source yield: An
examination of sources conducted during the SSBI-PR (TR-01-5). Monterey, CA:
Defense Personnel Security Research Center.
Sarbin, T.R. (April, 2001). Moral resistance to informing on coworkers. Paper presented
at a colloquium on Obtaining Information from the Workplace: Coworker and
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
288

Supervisor Reporting. Monterey, CA: Defense Personnel Security Research Center.


Wood, S. (2001). Public opinion of selected national security issues: 1994-2000 (MR-01-
04). Monterey, CA: Defense Personnel Security Research Center.
Wood, S., & Marshall-Mies, J.C. (2003). Improving supervisor and coworker reporting
of information of security concern. Monterey, CA: Defense Personnel Security
Research Center.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
289

CHARACTERIZING INFORMATION SYSTEMS INSIDER


OFFENDERS
Lynn F. Fischer
Defense Personnel Security Research Center
Introduction
The development of a database to track trends and common characteristics of
information systems insider offenses in the Department of Defense has been underway at
PERSEREC for over 3 years. An early analysis of data drawn from the Insider Events
Database was presented to IMTA at the Edinburgh meeting in 2000. Since then
additional information has been obtained from the investigative agencies of the military
services and we can more clearly define common characteristics and motivations of
offenders as well as the types of offenses they commit against defense systems. Insiders
are defined here as individuals holding a position of trust and given authorized access to a
defense information system. As military services throughout the world are increasingly
dependent on computer systems, internal networks, and the Internet, it is probable that
scenarios such as those described in this report will be repeated in the military
organizations of other countries.
While many approaches to detecting and preventing cyber-offenses committed by
insiders focus on technical countermeasures, we at PERSEREC are persuaded that the
insider threat is essentially a trust-betrayal issue. That is, the insider threat is a human
problem related to the selection and monitoring of persons who have use of or
administrative control of our critical networks. Another important factor working against
misuse or abuse of systems must be adequate security education. We have found that
many offenders did not know or had not been informed of what was unacceptable
behavior and what its consequences are for the integrity and operability of their systems.
Who Are the People Doing This and Why Do They Do It?
Based on a review of over 80 insider events in the database of different
magnitudes of seriousness, most offenses were committed by younger service members
or by information technology (IT) professionals under contract to a defense facility.
Forty-seven percent were attributed to misuse by uniformed service members. Of 33
events for which the rank of the offender is known, 22 involved junior enlisted personnel,
nine were committed by non-commissioned officers, and two by commissioned officers.
With few exceptions, the service members, whether IT professionals or not, knew a great
deal more about computer systems than was required by their job. Several engaged in
hacking or computer-related private enterprise from home during off-duty hours.
This paper reviews several significant findings or generalizations, based on data
available to date, that are emerging from the analysis. However, a much better
understanding of situational factors, motivations, and contributing causes can be gained
from in-depth case studies of those events that resulted in significant consequences for
the organization. Therefore, each of the general observations will be illustrated by a case
study that demonstrates the importance of situational factors at the place of employment,
interpersonal and social interrelationships (often hostile and vindictive), and the attitudes
of the offenders. The following observations are clearly emerging from the data acquired
from sources of record to date.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
290

Findings
Almost all insider offenders were system administrators or had some level
of administrative access to the system being abused.
Of the identified offenders 20% were system administrators, 34% were assistant
administrators, and another 41% had limited administrative access beyond that of a
normal user. Among the many events that could illustrate this point, one stands out as
unique, that of a Private First Class (PFC) who had helped to develop the U.S. Army’s
database for enlisted records. This junior service member, whose later conduct was
particularly egregious, was responsible for three events, each separated by several
months.1
Age 22 at the time of these events, the service member was an information
systems operator at Ft. Benjamin Harrison, Indiana. He claimed to have been interested in
computer from an early age and had received advance systems training while in the
Army. He also operated a small computer business from his home. When first arriving at
Ft. Harrison in 1995 he reported to a Captain who depended upon his computer skills and
gave him considerable freedom on the job. The PFC’s work position was information
systems operator and software analyst, and he was assigned to the Information Support
Agency Enlisted Records and Evaluation Center (EREC). A subsequent branch chief,
however, exercised greater authority over his activities and in fact ordered him to remove
unauthorized personal files from the system server. The soldier’s resistance to this policy
resulted in a non-judicial punishment in November 1998.
After continued animosity between him and his new branch chief, the soldier
apparently attempted to get even by disabling the system users’ accounts in April, 1999,
resulting in a shutdown of the EREC database system for about 3.5 hours. The PFC was
accused of damaging computer information and of unauthorized computer access. A
decision was made not to undertake a court martial against him. Action against the
service member resulted in another non-judicial punishment by which he was reduced in
rank, fined, and removed from all systems administrator level work-related duties.
But he was still intent upon getting even. With the assistance of a chat room
acquaintance located in Jamaica, he was able to steal passwords and infect several of the
workstations on his organization’s system with a Trojan virus (BO2K) which gave him
remote control of these workstations. He then proceeded to delete over 1,000 work-
related files of systems users. The culprit was not difficult to identify by special
investigators. In September, 1999, the service member was arrested and his residence
searched for evidence. Later that month, an unlawful intrusion was detected by a U.S.
Army computer network in Indianapolis and traced to someone attacking from Montego
Bay, Jamaica.
For this final attack on the Army system, the service member was formally
prosecuted. In June, 2000, he appeared before a general court martial convened at Ft.
Knox, Kentucky, and pleaded guilty to all charges. He was sentenced to a reduction to the
lowest enlisted rank, loss of all benefits and pensions, and 4 months of criminal
confinement to be followed by a Bad Conduct Discharge.

1
Information on this case summary is based on interviews with personnel at the scene of
the offence, interviews with case agents, and transcripts from the court martial
proceedings.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
291

In about 60% of these recorded events the offender demonstrated malicious


or criminal intent.
Not withstanding this fact, nearly 30% are not motivated by malice. Too often, as
in the previous case, an offender has a sense of ownership of a system that leads him to
believe that he can use it for personal gain or convenience and, in doing so, will neither
be criticized nor jeopardize the integrity of the government system. In these cases there
appears to be no intent to damage or destroy the system or to seek revenge on another
employee. Approximately a third of the events in the Insider Events Database fall into
this category. At one U.S. facility in Korea, for example, four service members actually
set up their own business web site on the government server, apparently thinking that no
harm would be done.
However, in some cases non-malicious behavior can result in serious trouble for
the offender. One example is the case of Michael Scott Moody, who at the time of the
offense was an Airman First Class with 90th Fighter Squadron, Elmendorf, Alaska.2 In
November, 1998, information was received from the Air Force Communication
Emergency Response Team (AFCERT) that two computers at Elmendorf Air Force Base
had been illegally accessed. The intrusion was traced to a home computer owned by
Moody who had set up NetBus3 on two workstations in his office so that he could operate
them remotely. A search of his personal computer revealed evidence of hacking, software
piracy, and possession of child pornography. In May, 1999, Moody was discharged from
the service so that he could face charges in a Federal court. He pleaded guilty to illegal
access to government computers and possession of child pornography. Moody was
sentenced to 10 month in prison.
According to a media account of the trial, Moody told the judge, “Honestly, at the
time, I didn’t consider it hacking. I thought of it more as a prank…I was curious to know
if I could access the computer at work. Being a government computer, I considered it a
challenge. I worked. I didn’t mean to hurt no one” (“Hacker gets time,” 1999)
As is often the case, the accused claimed to have been wrongly misled by chat-
room acquaintances who not only sent him child pornography but provided him with
NetBus software over the Internet. However, once the software was installed by Moody,
it allowed anyone with the knowledge of NetBus to access the Elmendorf AFB
computers, which contained personnel records and maintenance records for an F-15
squadron.

2
This case summary has been developed from numerous media reports published at the
time of Moody’s arrest.
3
NetBus is one of several software systems that permits control of a workstation from a
remote location. It may have legitimate uses, but its illicit use as a Trojan horse requires
the loading of a program on a targeted workstation usually via an attachment to an email
message. The unwitting user of the workstation is unaware that by opening the
attachment NetBus is loaded and the system made available to the remote intruder.
Hackers usually send NetBus to unsuspecting computer owners by e-mail and disguise it
in the attachment of a computer game or graphic file.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
292

Over 60% of these offenses resulted in serious damage or compromise to the


system.
Among the worst consequences of the actions of a malicious insider are denial of
service to authorized users and the destruction of official files and software since our
primary concern is with the unimpaired operability of defense information systems,
particularly in the event of armed conflict. Data obtained to date show that 23% of the
events involved denial of service attacks, while 11% resulted in the destruction of official
files or software. An additional 29% resulted from the introduction of unauthorized
software into a government system. Some of this software was of the malicious or
damaging type. The first case discussed in this paper is the best example of this type of
offense; however, there are several others. Another prosecuted case involved a
disgruntled government civilian employee of the U.S. Coast Guard.
In early 1998, Shakuntla Singla, civilian employee and systems administrator for
the U.S. Coast Guard in Washington, DC used the password and identification of another
employee to gain access to the Coast Guard system from her home after resigning from
the organization. Singla, a Coast Guard employee, was reported to be angry over the fact
that the organization had ignored her reports about improper conduct by an IT contractor
employee. She had in fact filed a complaint with the Equal Employment Opportunity
Commission claiming that she was subject to a hostile working environment.
Two months later, employees noticed that critical files had been deleted from the
Coast Guard nation-wide personnel database, causing the system to shut down.
According to a news report, “The July crash wiped out almost two weeks’ worth of
personnel data used to determine promotions, transfers, assignments and disability claim
reviews for Coast Guard personnel nationwide” (”Woman gets five months,” 1998). The
prosecuting Assistant U.S. Attorney stated, “It took 115 Coast Guard employees across
the country working more than 1,800 hours to recover and reenter the data, at a cost of
more than $40,000.”
It was clear that, because of the precision by which the hacking was
accomplished, the culprit was an insider or had inside information. Singla was linked to
the crime by the FBI through computer and phone records and the fact that she had used
an access code, known only to a few people, to enter the system. Singla had helped to
build the personnel database she later attacked.
While claiming that she had not intended the computer system to crash, Singla did
plead guilty to unauthorized access and deletion of files. She was sentenced to 5 months
in prison, ordered to pay $35,000 of restitution to the Coast Guard, and placed on several
months of home detention. Singla stated to a media reporter, “I wanted to get even with
them. I was frustrated and depressed because no one listened to my complaints of sexual
harassment in the workplace. I did delete information, but I did not crash the system”
(”Coast Guard,” 1998).
Many offenses resulted from unauthorized use of a defense system for
personal convenience.
There are several accounts in the Insider Database of government systems being
used by service members or employees for personal pleasure or convenience; however,
few of the offenders have malicious or criminal intent. In at least seven events, system
administrators or their assistants set up unauthorized storage directories on government
servers for bootlegged game software, music, or pornography collections. In the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
293

following example, however, a cadet at the U.S. Air Force Academy was accused not
only of misusing the academy system for personal chat room activity, but using it as a
platform from which to launch a criminal attack on companies in the private sector
(“Academy Jurors,” 1999).
A second-year cadet, Christopher Wiest, along with other cadets, was ordered to
stop using Internet chat rooms out of security concerns. Several months later Wiest
resumed active participation in chat rooms with the assistance of several cyber-friends
not connected to the Air Force. He in fact set up an Internet relay chat room (IRC) server
on his PC that was connected to the USAF Network. Unfortunately, his “friends” were
engaged in extensive hacking around the Internet and involved Wiest in their activities.
At the time Wiest claims that he had no idea of what these people were doing. In
November 1997, the Air Force Office of Special Investigations searched Wiest’s room
and seized his computer. Weist was initially charged with using the Air Force system to
illegally enter three companies and cause $80,000 damage. Two of these charges were
later dropped. Prosecutors argued that Wiest used the Air Force platform to connect to
the Internet, and then hacked into company systems, erased data, and planted destructive
programs.
In March, 1999, Wiest was found guilty by court martial for using an Air Force system to
break into and damage a private company’s computer system, causing $6,000 damage.
He was dismissed from the academy and the service (“Air Force Academy,” 1999).
An unexpectedly high frequency of offenses can be categorized as inside
hacking.
While computer hacking is generally not thought of as an insider offense
committed by trusted employees, the data revealed an unanticipated high frequency of
insider hacking, that is, the use of a government platform to gain unauthorized access to
either another defense system or a system outside of government. Sixteen percent of the
events concerned the former and 11% involved hacking to a private sector system. A
particularly serious example of this type of case was reported in the press as attempted
espionage; however, the court convicted the offender only of unauthorized use of
government property.
PFC Eric Jenott, assigned to duty as a communication switch operator at Ft.
Bragg, North Carolina, since June 26, 1996, was charged on August 21st of that year with
espionage, damaging military property, larceny and unauthorized access to government
computer systems (“Court-martial,” 1996). Specifically, Jenott was accused of providing
a classified system password to a Chinese citizen located at Oak Ridge, Tennessee.
Prosecutors contended that Jenott was attempting to gain favor with the Chinese
government because he wanted to defect to China. According to the accused soldier, the
password was not classified as secret, and charges related to his penetration of defense
computer systems stem from his attempt to be helpful when he discovered a weakness in
an encoded Army system. However, he did admit to having been an active hacker for
several years before joining the Army and during that period broke into Navy, Air Force,
and the Defense Secretary’s systems.
On January 3, 1997, the court found Jenott not guilty of espionage, but guilty of
damaging the security of an Army encoded system, exceeding authorized access to a
government system, and transmitting a code with the intent to damage. He was given a
Bad Conduct Discharge, reduced to the lowest enlisted rank, and sentenced to 3 years

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
294

imprisonment, less the 6 months already served in pre-trial confinement (“Jury finds,”
1996).
Few offenders were aware that their unauthorized activities could be easily
monitored.
Systems abuse, particularly where it involves electronic communication across
organizations is frequently detected by network monitoring systems that alert law
enforcement services to anomalies. These monitoring systems, or computer emergency
response teams (CERTS), routinely monitor traffic in and out of defense networks for
intrusions and attempted intrusions. Thirty-three percent of the events recorded in the
database resulted from CERT notification. Twenty-three percent were detected by
internal monitoring of a network.4
One of the most disconcerting aspects of the misuse of defense information
systems is seen in several cases in which access to a government system was given to
unauthorized persons by an insider. In the following example, there is little evidence that
the offender had malicious intent against his organization or the system itself, but wanted
to use the government server for personal convenience.
In May, 1999, the Air Force CERT detected suspicious connections from Israel
and the Netherlands into a computer located at a U.S. Air Force Base. The recipient of
these communications was identified as a contractor employed as the system
administrator. The administrator admitted that he reconfigured the computer and created
accounts for two unknown individuals so that they could trade pirated computer gaming
software and that he copied game software to a compact disk using his government
computer. He also stored unauthorized software on system media.
This was not a case of ignorance of regulations in this case. The employee
acknowledged that he had completed the required USAF Computer Security and
Awareness Training Program and was aware that these activities were not official. His
access to USAF information systems was removed by his employer and he was
dismissed.
Of particular concern is the frequency of cases in which the insider offender
was a contracted IT professional, and sometimes a foreign national.
The outsourcing of IT support to manage critical defense networks is increasingly
common due to the scarcity of service members having the necessary technical training.
The downside to this trend is that unless the employee requires access to classified
information, he or she is unlikely to receive any type of personnel security vetting prior to
employment. The government may have little or no control of who is employed by a
primary contractor under a sub-contract to service or maintain a sensitive information
system. Twenty percent of the offenders identified in the database were civilian
employees under contract to a Defense organization.

4
Automated Security Insider Monitoring Systems (ASIMS) alert systems managers to
anomalous actions by users such as unusual file transfers and hacking within the system.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
295

The most significant event of this type is that of the compromise of a highly
sensitive but unclassified Air Force aircraft maintenance and inventory database that took
place in 1996 (Caruso, 2003).5 In December of that year, a system administrator at Wright
Patterson Air Force Base discovered a security breach in the operations side of the Air
Force Reliability and Maintainability Information System (REMIS) that tracks all aircraft
and weapon systems. The breach was soon traced to Zhangyi “Steven” Liu, a sub-
contractor employee, one of 11 young Chinese nationals who had recently been brought
over to work on the software development side of the database system. Somehow Liu
was able to access a “super super password file” that gave him access to the operational
database and the power to change or delete any file in the system. He and two other
coworkers proceeded to download unauthorized files to a personal directory that could
have been accessed by Internet users. It was never established whether these data were
transmitted outside the country or what his true motivations were in breaking into the
system. The prime contractor was forced to spend $350,000 to examine the code and
database to ensure that no malicious code had been installed by Liu or his coworkers.
In March, 1997, Liu pleaded guilty to two counts of gaining illegal access to the
$148 million REMIS. He later withdrew his plea and was found guilty on two
misdemeanor counts by a jury. He received a sentence of 4 months confinement, 1 year
of supervised work-release and a fine of $4,000 (“Chinese national,” 2000).
Summary and Conclusions
The following conclusions are based on the analysis of information in the Insider
Database and case studies described above. These have been reinforced by a parallel
study of insider events in the private sector sponsored by PERSEREC that has provided a
number of additional insights into the patterns of activity associated with these attacks on
sensitive information technology resources.6
• Technical security measures offer minimal protection from abuse when the
offender is a systems administration or has some level of administrative access
to the system.
• Interpersonal relations within the workplace and the organization’s climate are
very important for understanding IT systems misuse. In almost a quarter of the
cases there was evidence of prior hostility in the workplace involving the
offender and usually a supervisor.
• Some of these events could have been avoided by better security education.
Personnel need to know what the rules are concerning the use of the system,
what is acceptable and not acceptable use of that system, and what the
consequences are for stepping over the line.
• Both enhanced personnel security and technical deterrents should be applied to
minimize the threat posed by angry or indifferent personnel who have legitimate
access to defense information systems.

5
This case summary is based on information contained in a recent thesis by Lt. Valerie L. Caruso
at the Air Force Institute of Technology and on media reports of that time.
6
Undertaking this project for PERSEREC is Dr. Eric Shaw, Consulting & Clinical Psychology,
Ltd. Shaw is focusing on prosecuted cases in which an insider has attacked a corporate system
that is related to the critical national infrastructure. A report on this work is forthcoming in 2004.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
296

• Many offenses occurred after discharge or transfer to a new duty station—within


60 days after separation—indicating the need for greater attention to discharge
security and personnel planning.
• Several attacks involved employee remote access to the corporate system,
indicating a need for a review of safeguards covering this practice.
• In several cases examined, a lack of personnel and/or security policies can be
cited as having contributed to the event.
• In some cases, evidence of disgruntlement or performance problems was visible
to management well in advance of an attack. Delay in intervening in the
underlying personnel problem contributed to the episode or failed to divert the
subject from his destructive path.
The results of these studies of insider events thus far indicate that there may be
significant “gaps” in policies and practices designed to reduce the risk of insider events or
detect and manage this risk when it exists. While work continues in this area, it is not too
early to offer specific recommendations that might prevent the inoperability or
impairment of a defense information system at a critical time. The importance of
continuous network monitoring cannot be stressed enough. This should be reinforced
with the articulation and enforcement of clear policies regarding the use or misuse of
information systems. On the non-technical side, both administrators and end-users require
security awareness training that is appropriate to their use of the information system. And
lastly, while supervisors and managers must deal with disgruntlement and interpersonal
conflict in the workplace in a timely fashion, it is essential for defense organizations to
vet or screen IT job applicants for evidence of past systems abuse, hacking, or illegal
behavior.
References
Academy Jurors get Lesson in Hacking During cadet’s trial. (March 16, 1999). Colorado Springs
Gazette Telegraph.
Air Force Academy Dismisses Cadet for Hacking into Computer. (March 14, 1999). Chicago
Tribune.
Caruso, Valerie L. (2003). Outsourcing Information Technology and the Insider Threat. Dayton
OH: Graduate School of Engineering and Management, Air Force Institute of Technology.
Chinese National gets sentence of 4 months. (January 15, 2000). Dayton Daily News.
Coast Guard beefs up security after hack. (July 20, 1998). Computer World.
Court-martial to begin in computer spy case. (December 9, 1996). San Diego Union-Tribune.
Fischer, Lynn F., Riedel, James A., & Wiskoff, Martin F. (2000). A New Personnel Security
Issue: Trustworthiness of Defense Information Systems Insiders. Proceedings of the 2000
IMTA Conference, International Military Testing Association.
Hacker Gets Time in Prison; Former Airman Downloaded Porn. . (July 2, 1999). Anchorage
Daily News.
Jury finds Ft. Bragg Soldier innocent of espionage. Computer fraud, property damage charges
draw 3-year sentence. (December 23, 1996). Durham Herald-Sun.
Woman gets five months for hacking; Tampering Ruined Coast Guard Files. (June 20, 1998).
Washington Post.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
297

Ten Technological, Social, and Economic Trends That


Are Increasing U.S. Vulnerability to Insider Espionage
Lisa A. Kramer
Defense Personnel Security Research Center
Richards J. Heuer, Jr.
RJH Research/Defense Personnel Security Research Center
Kent S. Crawford
Defense Personnel Security Research Center
Introduction
Permanent and temporary employees, vendors, contractors, suppliers, ex-
employees, and other types of “insiders” are among those who are most capable of
exploiting organizational assets at greatest expense to U.S. interests. Due to their
knowledge of the public agencies and private companies that employ them, their
familiarity with computer systems that contain classified and proprietary information, and
their awareness of the value of protected information in the global market, insiders
constitute a significant area of vulnerability for national security (Fialka, 1997; Freeh,
1996; Nockels, 2001; Shaw, Ruby, & Post, 1998; Thurman, 1999; Venzke, 2002). An
estimated 2.4 million insiders have access to classified information currently, and while
difficult to approximate, insiders with access to proprietary and sensitive technological
information are likely to number in the tens of millions (National Security Institute, June
2002). While the deliberate compromise of classified or proprietary information to
foreign entities is a relatively rare crime, even one case of insider espionage can cause
extraordinary damage to national security.
Because espionage is a secret activity we cannot know how many undiscovered
spies are currently active in American organizations, or what the future will bring in
terms of discovered espionage cases. Nonetheless, we are not entirely in the dark when
assessing the magnitude of the insider espionage threat. We can draw inferences from
relevant changes in technology, society, and the international environment that affect
opportunity and motivation for espionage.
In exploring current and future prevalence of insider espionage this study employs
a methodology similar to that used in epidemiological research where scientists explain
or forecast changes in the prevalence of certain diseases within various populations. The
medical researcher knows that heart disease is associated with age, weight, amount of
exercise, blood pressure, diet, stress, genetics, and other factors, and can thus estimate
prevalence of heart disease by analyzing changes in these variables. Similarly, because
we know that certain factors influence the likelihood that insider espionage will occur, we
can forecast changes in the prevalence of insider espionage by examining changes in
these factors. This study examines U.S. vulnerability to insider espionage by exploring
ten technological, social, and economic trends that affect opportunity and motivation for
spying.
Opportunity for espionage consists of access to classified or proprietary
information that can be exchanged for money or other benefits, access to foreign entities
interested in obtaining this information, and means for transferring this information to
foreign recipients. Motivation, broadly defined, is a feeling or state of mind that
influences one's choices and actions. While motivation for espionage results from a
complex interaction between personality characteristics and situational factors (Crawford
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
298

& Bosshardt, 1993; Eoyang, 1994; Sarbin, 1994; Parker & Wiskoff, 1991; Shaw, Ruby &
Post, 1998; Timm, 1991), this study focuses primarily on the latter.
Findings of this study suggest that the information revolution, global economic
competition, the evolvement of new and non-traditional intelligence adversaries, and
other changes in the domestic and international environment have converged to create
unusually fertile ground for insider espionage. Findings of this study suggest that greater
numbers of insiders have the opportunity to commit espionage and are more often
encountering situations that can provide motivation for doing so.
1. Technological advances in information storage and retrieval are dramatically
improving insiders’ ability to access and steal classified and proprietary information.
2. The global market for protected U.S. information is expanding. American insiders
can sell more types of information to a broader range of foreign buyers than ever
before.
3. The internationalization of science and commerce is placing more employees in a
strategic position to establish contact with foreign scientists, businesspersons, and
intelligence collectors, and to transfer scientific and technological material to them.
4. The increasing frequency of international travel is creating new opportunity for
motivated sellers of information to establish contact with, and transfer information to
foreign entities. Foreign buyers have greater opportunity to contact and assess the
vulnerabilities of American personnel with access to valuable information.
5. Global Internet expansion is providing new opportunities for insider espionage.
The Internet allows sellers and seekers of information to remain anonymous and
provides means by which massive amounts of digitalized material can be transmitted
to foreign parties in a secure manner.
6. Americans are more vulnerable to experiencing severe financial crisis due to
aggressive consumer spending habits and other factors. Financial problems are one of
the primary sources of motivation for insider espionage.
7. The increasing popularity of gambling and prevalence of gambling disorders
suggests that greater numbers of insiders will commit workplace crimes such as
espionage to pay off debts and to sustain gambling activities.
8. Changing conditions in the American workplace suggest that greater numbers of
insiders may become motivated to steal information from employers to exact revenge
for perceived mistreatment. Because organizational loyalty is diminishing, fewer
employees may be deterred from committing espionage due to a sense of obligation
to the agencies and companies that employ them.
9. More insiders now have ethnic ties to other countries, communicate with friends
and family abroad, and interact with foreign businesspersons and governments.
Foreign connections provide insiders with opportunities to transfer information
outside the U.S. and foreign ties can provide motivation to do so.
10. More Americans view human society as an evolving system of ethnically and
ideologically diverse, interdependent persons and groups. While this is obviously
beneficial, it is also possible that some insiders with a global orientation to world
affairs will view espionage as morally justifiable if they feel that sharing information
will benefit the “world community” or prevent armed conflict.
Despite the significance of individual characteristics in determining which
specific insiders will commit espionage, if more insiders are encountering situations that

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
299

can provide motivation and opportunity for espionage – as the findings of this study
suggest – it is likely that the crime of insider espionage will occur more frequently. In our
research we were unable to identify a single countervailing trend that will make insider
espionage more difficult or less likely in the immediate future. Findings of this study
suggest that increased investment of government resources to counteract the insider
espionage threat is warranted.
References

Crawford, K. & Bosshardt, M. (1993). Assessment of position factors that increase


vulnerability to espionage. Monterey, CA: Defense Personnel Security Research
Center.

Eoyang, C. (1994). Models of espionage. In T.R. Sarbin, R.M. Carney, and C. Eoyang
(Eds.), Citizen Espionage: Studies in Trust and Betrayal (pp. 69-91). Westport, CT:
Praeger.

Fialka, J. (1997). War by other means: Economic espionage in America. New York:
W.W. Norton and Company.

Freeh, L. (1996). Statement of Louis J. Freeh, Director Federal Bureau of Investigation


before the House judiciary Committee Subcommittee on crime. Retrieved December
2002 from http://www.fas. org/irp/congress/1996_hr/h/h960509f.htm.

National Security Institute (June 2002). U.S. Security managers warned to brace for more
terrorism, espionage. National Security Institute Advisory, June 2002.

Nockels, J. (2001). Changing security issues for government. (http://www.law.gov.au


/SIG/papers /nockels.html).

Parker, J. & Wiskoff, M. (1991). Temperament constructs related to betrayal of trust.


Monterey, CA. Defense Personnel Security Research Center.

Sarbin, T. Carney, R., & Eoyang, C. (Eds.) (1994). Citizen espionage: Studies in trust
and betrayal. Westport, CT: Praeger.

Shaw, E., Ruby, K. & Post, J. (1998). The insider threat to information systems. Security
Awareness Bulletin, 2-98. Department of Defense Security Institute.

Timm, H. (1991). Who will spy? Five conditions must be met before an employee
commits espionage. Here they are. Forewarned is forearmed. Security Management,
49-53.

Thurman, J. (1999). Spying on America: It’s a growth industry. Christian Science


Monitor, 80, 1.

Venzke. B. (2002). Economic/Industrial Espionage. Retrieved June, 19, 2002 from


http://www.infowar.com/class.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
300

Development of a Windows Based Computer-Administered


Personnel Security Screening Questionnaire1
Martin F. Wiskoff
Northrop Grumman Mission Systems/
Defense Personnel Security Research Center
Introduction
The Defense Personnel Security Research Center (PERSEREC) developed a computer-
administered questionnaire for screening enlisted applicants to sensitive Navy
occupations that has been operationally used by the U. S. Navy Recruiting Command
since 1996. The goal of questionnaire, called the Military Applicant Security Screening
(MASS 3.0) is to:

1. Reduce the number of Navy enlisted applicants who are processed for security
clearances and subsequently found ineligible.
2. Identify these applicants early in the accessioning process - at the Military
Entrance Processing Stations (MEPS) - before they are accepted into high security Navy
jobs.
3. Reduce the number of unfilled school seats and jobs due to the later
ineligibility of enlisted personnel assigned to these occupations.
4. Develop a more flexible mode for administering personnel security screening
items and collecting more detailed information.

A detailed description of the MASS system, including development of the questionnaire


and the manner in which it is operationally administered is contained in Wiskoff, et.al.
(1996), and Wiskoff & Zimmerman (1994). As stated in Wiskoff, et.al, “the MASS
questionnaire inquires about the following areas of security concern: (1) alcohol
consumption; (2) allegiance; (3) drug involvement; (4) emotional and mental health; (5)
financial responsibility; (6) foreign travel and connections; (7) law violations; (8)
personal conduct; and (9) security issues. These areas, and the specific questions within
the areas, were developed by reviewing DoD security guidelines, evaluating existing
paper and pencil security questionnaires and discussing specific issues to be included
with security and legal professionals.”

“Each applicant for a sensitive rating is individually administered the MASS


questionnaire by a Navy classifier. The system includes a decision aid that automatically
informs the classifier whether the information provided by the applicant is disqualifying
or potentially disqualifying for the rating being considered, or whether it requires that a
waiver be obtained to allow the applicant to enter the Navy. This decision aid, appearing
as a flag, is triggered whenever an applicant response meets criteria for one or more of
these situations. The rules for the decision aid were established by linking all possible
responses to MASS questions to criteria contained in the Navy Recruiting Manual
concerning acceptance into ratings and into the Navy.”

1
MASS 4.0 was demonstrated as part of this presentation.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
301

Requirement
Inspection of MASS questionnaires completed since 1996 indicates that considerable
numbers of applicants self-disclose information that needs to be reviewed before a
decision can be reached whether to accept them into a high security occupation. In
addition, according to Navy classification personnel who administer MASS, its very
presence makes potential applicants self-select out of these sensitive positions if there is
serious derogatory information in their backgrounds.
However in recent years the program has begun to show its age. MASS 3.0 was
programmed in Turbo Pascal and designed for the IBM 286 computers that were
available at the MEPS in 1996. As newer computers have replaced the 286s some
incompatibility issues have arisen with running MASS that have required temporary
fixes. Difficulties have arisen at some MEPS locations in printing MASS interview
summaries and there has been dissatisfaction with the inability to store and retrieve
results of previous applicant interviews.
In the summer of 2000 a survey was conducted of MASS classifiers at the MEPS to
determine changes that would facilitate its use (Reed, 2000). The primary
recommendation was the need to upgrade the platform to a Windows-based one. Other
desirable features according to those who responded are:
1. Quicker MASS completion time…MASS 3.0 takes at least 20 minutes even for
applicants with little to report. It can take as long as 45 minutes to an hour.
2. Error reducing features such as drop-down lists.
3. Online help such as popup definitions
4. Flexibility in ability to designate Navy ratings when the system matches
applicant responses to Navy Recruiting Manual criteria.
5. Increased detail in the printed interview report such as including the full
question asked along with the responses.
6. Storage and easy retrieval of previous MASS interviews to facilitate re-
interviewing applicants when they return to the MEPS for final processing from the
Delayed Entry Program.
7. Enabling printing of the interview report from a network printer
8. Enabling electronic forwarding of the interview report to the office responsible
for providing guidance whether to continue processing the applicant.
Based on the results of the survey and subsequent discussions, a request was received
from Navy Recruiting Command in February 2001 to develop a Windows-based version
of MASS that would incorporate the field recommendations and add other features that
would enhance the screening process.
Development of MASS 4.0
The MASS 4.0 design addressed all of the field recommendations. The questionnaire
administration was conceptualized as a three-stage procedure. This resulted in a
streamlined questionnaire that contains a set of 20 first level questions that cover the 9
areas of security concern mentioned in the introduction as being included in MASS 3.0,
plus a newer area of “information technology systems.” The number of first level
questions in each of the security areas is shown in column 2 of Table 1.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
302

Table 1
MASS 4.0 Areas of Inquiry
Security Area First Level Questions Second Level Questions
N N
Alcohol consumption 1 7
Allegiance
Espionage 1 3
Other 1 6
Drug involvement
Marijuana 1 1
Other 1 11
Emotional/mental health
Treatment 1 4
Suicide 1 2
Financial responsibility
Problems 1 18
Debts 1 2
Foreign travel/connections
Contact 1 15
Other 1 8
Information technology 1 8
Law violations
Arrested/charged 1 53
Vehicle related 1 18
Detained 1 1
Civil 1 3
Personal conduct
School/job 1 8
Other 1 4
Security issues
Denied clearance 1 2
Problems 1 4

For example, the first level question in the Law violations (Vehicle related) area is:

• Have you ever been cited, arrested or charged by civilian or military law
enforcement officials for any vehicle related violations (e.g., improperly licensed
or unregistered vehicle, operating an unsafe vehicle, driving without a license,
speeding, hit and run, DUI)?

If a positive response is received to that question the program would present 18 second
level questions to determine the nature of the violation, e.g. hit and run or DUI. The
number of second level questions by security area is displayed in column 3 of Table 1.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
303

Finally, third level questions would be asked to obtain details of the incident, such as the
following for “hit and run”:

How many times have you been cited, arrested or charged for hit and run?
Why were you cited, arrested or charged?
Where and when did this occur?
Was the offense a felony?
What was the final outcome of the case?
Were you given a jail or prison sentence?...length of sentence
Were you given any other punishment?...nature of punishment
When did your jail term or other punishment end?

Upon completion of all questions the classifier selects the Navy rating being considered
for the applicant and flags are shown indicating possible issues that might arise during a
background investigation. The MASS 4.0 program, in addition to containing the MASS
3.0 linkage of applicant responses to the Navy Recruiting Manual guidelines, relates the
responses to the Adjudicative Guidelines For Determining Eligibility For Access To
Classified Information (Director of Central Intelligence, 1998).
These Guidelines are used by defense and intelligence community adjudicative agencies
in making determinations whether to grant a security clearance. The classifier is advised,
depending on the nature of the flags, to contact the appropriate authorities for permission
to proceed with processing the applicant. A final step is to print a form that documents
the results of the interview which is then signed by both the classifier and the applicant.
MASS Evaluation and Implementation
MASS 4.0 was tested and evaluated at 4 MEPS during the months of May/June 2003 as a
replacement for MASS 3.0. There was agreement by all classifiers who used the program
with applicants that MASS 4.0 should be made operational at all MEPS. Following some
additional minor programming changes the system was delivered to Navy Recruiting
Command and implemented nationwide in September 2003.
Within the next year we plan to improve some of the MASS 4.0 screens without changing
the basic nature of the program. Perhaps the most important future modification will be
the capability to electronically capture applicant responses for analysis. This will permit
us to establish a database of responses that could be related to future personnel actions
such as whether the applicant was found not acceptable during a security interview at
recruit training, or did not receive a security clearance after being processed for a
background investigation.
References
Director of Central Intelligence. (1998). Adjudicative Guidelines for Determining
Eligibility to Access to Classified Information (DCID 6/4, Annex C, Jul. 2, 1998).
Washington, D.C.: Author.
Reed, S. C. (2000). Unpublished analyses. Monterey, CA: Defense Personnel Security
Research Center.
Wiskoff, M. F., Zimmerman, R. A. and Moore, C. V. (1996). Developing and
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
304

Implementing a Computer-Administered Personnel Security Screening Questionnaire.


Paper in Symposium, Personnel Security in the Post-Cold War Era. Proceedings of the
38th Annual Military Testing Association Meeting. San Antonio, TX.
Wiskoff, M. F. & Zimmerman, R. A. (1994). Military Applicant Security Screening
(MASS): Systems Development and Evaluation. (PERSEREC Technical Report 94-
004). Monterey, CA: Defense Personnel Security Research Center.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
305

OCCUPATIONAL ANALYSIS APPLIED FOR THE PURPOSE OF


DEFINING OF SELECTION CRITERIA FOR NEW MILITARY
OCCUPATIONAL SPECIALTIES IN THE ARMED FORCES OF THE
REPUBLIC OF CROATIA
Tomislav Filjak, Ingrid Cippico, Nada Debač, Goran Tišlarić, Krešimir Zebec
MINISTRY OF DEFENCE OF THE REPUBLIC OF CROATIA
Stančićeva 6, 10 000 Zagreb, Croatia

ABSTRACT

The decision on all-inclusive re-organisation and downsizing of the Armed Forces of


the Republic of Croatia made in the early 2000 also envisaged the military specialties
structure. The demands included: reduced number of the specialties, less specialised duties
and NATO-compatibility. Within each branch and service one expert was assigned with new
classification of specialties, which he performed in consultation with the branch (service)
specialists. The experts were assisted each by a psychologist and a physician for possible
occupational analysis to serve as a background for a more radical modification of previous
specialties, if required so.
Defining of new specialty structure was followed by a occupational analysis, aimed at
defining the entry criteria (psychological, physical, medical) for each individual specialty.
The analysis was based on the qualitative and quantitative analysis of data collected by means
of a questionnaire administered on a group of experts. The version of the questionnaire used
was the one adapted and tested through previous job-analysis assignments, and comprised the
psychological, the physical and the medical aspects of a duty. One military psychologist per
branch (service) was tasked with the administration of the questionnaire and with the data
analysis for all specialties within it, in co-operation with the specialty expert and the
physician. They had to “defend” the analysis results before the team leading the project. It
“bequethed” us a “manual” containing the job analysis results, and will be transposed into
future regulations books on selection and similar military documents.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
306

OCCUPATIONAL ANALYSIS APPLIED FOR THE PURPOSE OF


DEFINING OF SELECTION CRITERIA FOR NEW MILITARY
OCCUPATIONAL SPECIALTIES IN THE ARMED FORCES OF THE
REPUBLIC OF CROATIA
Tomislav Filjak, Ingrid Cippico, Nada Debač, Goran Tišlarić, Krešimir Zebec
MINISTRY OF DEFENCE OF THE REPUBLIC OF CROATIA
Stančićeva 6, 10 000 Zagreb, Croatia

INTRODUCTION

Croatian armed force came into existence with the Croatia’s fight for independence in
1991; a war-time military was following the war no longer corresponded with the new
security environment of the Republic of Croatia. In 1996 thus Croatian Armed Forces
underwent a first re-organisation, which still did not accommodate new exigencies. Therefore,
early in 2000 a new, radical reform, entailing major cuts, was launched and is still under way.
Its extent is best illustrated by the reduction figures: from 50 000 members in the late 2002 to
the projected 22000 active personnel (plus 4000 conscripts and 3000 civilian employees) by
the year 2005. The Armed Forces, as envisaged, are to be manned by career personnel (as
much as 80%). Moreover, a new national security strategy and defence strategy envisage new
missions for Croatian Armed Forces, among which participation in international operations
(Croatian observers in the UN mission in Sierra Leone, Ethiopia and Erithrea, Western
Sahara, Kashmir and the ISAF).
The re-organisation and re-assignment entails a new military specialties structure to
match the reduced manpower, career military and altered military duties.

EXIGENCIES FOR NEW SPECIALTIES

War-time Croatian military was a large force compared to the overall population, and
at one moment comprised 240 000. It had been organised for traditional warfare and its very
diverse specialties structure (e.g. 260 soldier specialties ) corresponded to that aim. It could
not allow career military and development of new capabilities. A number of previous
specialties disappeared naturally, and others altered as a result of changing military even
before the re-organisation project. In order to match the specialties system with the military
exigencies, well-defined criteria have been set for the new specialties.
The exigencies for the new system were as follows:
- reduced number of specialties (compared to the prior situation)
- entry to a specialty is achieved in the enlisted soldier or officer status (NCo status
is excluded as NCOs develop from enlisted soldiers)
- 14-week hands-on training (enlisted soldiers)
- less specialised duties (increased number of duties contained in a single specialty)
- compatibility with the NATO system of specialties and classification

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
307

CLASSIFICATION OF DUTIES INTO NEW SPECIALTIES

A first step in new specialties system was to define branches and services at the
Armed Forces level, and to define specialties within each individual branch and service. This
was executed without previous empirical studies, in view of the brief term allowed for
development of the new system.
Inventory of branches and services was agreed at the senior authorities level, and was
mostly based on the existing system and planned reforms. This was followed by appointing of
a Commission authorised for proposing the changes to the inventory and entrusted with co-
ordination of the new specialties system. The Commission designated an expert per each
branch and service, mostly an experienced and a respectable officer in the branch/service,
who was then tasked with preparing a new inventory of specialties for the respective
branch/service. He was encouraged to consult other experts of the branch. His “pool” also
included a psychologist and a physician, who were counted on to do job analysis at his
request in possibly vague situations and radical changes, to enable decision-making.
While for some of the branches the job was performed in a short term and univocally,
for others it took quite a long time. It was mostly the case with new duties. Prior to decision
on specialties structure additional clarifications were requested that regarded the projected
scope, content and modality of operation. Again, as the debate on how to organise new
domains may take some time and includes a number of subjects, the job is still not finished
for some domains.

ENTRY CRITERIA-REQUIRED JOB ANALYSES

The new specialties structure defined, job analysis was conducted for each considered
specialty.

Analysis objective
The objective of analysis was to define entry criteria (psychological, physical and
medical) that the candidates for a respective specialty are expected to meet.

Analysis modality
As mentioned above, each branch/service was assigned a task group made up of the
respective branch/service leader-expert, a psychologist and a physician – all of them, as a rule,
with a lengthy service in the branch/service and the experience with the duties analysed.
The analysis was based on the quantitative and the qualitative processing of data
compiled by means of a questionnaire administered on a group of experts.
Military psychologists were tasked with the questionnaire administration and
quantitative analysis of the results. The final step was then qualitative analysis of the data,
which was done by all the members of the group, and drafting of a final report.

Questionnaire
In job analysis the adapted and previously tested version of questionnaire – the
”VSSp-1” was used, employed previously for the purposes of the kind. It combines the
psychological, the physical and the medical aspects of duties. In addition to basic data (bio
data, unit, date) the questionnaire includes the following units:

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
308

- duty description, list of tools/instruments utilised, protective gear used


- general features (physical strain, exposure of particular senses, frequency of
particular movements, working conditions)
- relevance of particular senses, use of aids
- social working conditions (co-workers)
- relevance rating for 24 different abilities
- relevance rating for 20 personality traits
- main cause of underperformance in a specialty
- major incidence of injuries, accidents and occupational diseases in the specialty

The questionnaire was based on different schemes for job analysis employed in
Croatia over the past 40 years. The list of abilities and personality traits rated by relevance
was composed based on descriptions derived from classical theories in the matter (by
Thurstone, Burt, Vernon, Cattell capability-wise; by Cattell, Eysenck, Big Five personality-
wise). The original questionnaire emerged in 1993, and being used several times for specific
purposes underwent significant structural and substance revisions and adaptations. Substance-
wise, the questionnaire now is combination of job (behaviour) oriented and psychologicaly
(required characteristics) oriented. Previous testing demonstrated inteligibility of the
questionnaire to raters and univocality of the data obtained.

Administration of the questionnaire


As stated above, each branch/service had its own psychologist to conduct
administration of the questionnaire and quantitative analysis of the results for each specialty
within it. They consulted the branch expert-leaders and physicians.
Each specialty was assigned no less than 30 experts – 1/3 officers and 2/3 NCOs and
enlisted soldiers. Respondents were selected based on the following criteria:
a) years of experience in the duty
b) respectability (as judged by the expert-leaders and psychologists) and
c) assessed respondents’ ability to provide credible answers to questionnaire’s
items (as judged by psychologists)
The respondents undoubtedly held the competence for rating new specialties as the
specialties have been derived from the previous ones (either through repetition, merging or
separation) with which they were more than familiar.
Selection of respondents was followed by compilation of ratings obtained in
individual and group (up to 10 respondents) administrations, and that was the psychologists’
task.

Analysis results
The report on job analysis for each specialty was submitted to the Commission in
charge of the project. This was followed by presentation and discussion of the analysis results
by the groups before the Commission, and harmonisation and elaboration of the conclusions
in certain specialties based on the discussion. Once agreed, the conclusions were integrated
into a final report, which unified the entry psychological, physical and medical criteria for all
the specialties considered.
The analysis to a certain extent affected the job/duty structure within a specialty, as
some analysis results revealed inacceptability of the proposed job/duty structure within a
branch/service, which needed modifying.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
309

CONCLUSIONS

The procedure described issued a “manual” that contains the conclusions of job
analysis which will be incorporated into new selection regulations and the related documents
in the Armed Forces. The results obtained and prepared enable radical and swift re-
organisation. However, the pace the procedure was conducted at was partly at the expense of
its quality.
We expect that in the time ahead the results, and “the lesson learned” will serve as
basis for continuous job analysis that will bring to detailed harmonisation of the job
substance, training and selection for respective specialties.

REFERENCE

Bujas, Z. (1959). Psihofiziologija rada. Zagreb.


Harvey, R.J. (1994). Job Analysis. In M.D. Dunnette and L.M. Houg (Ed.), Handbook of
Industrial and Organizational Psychology. Palo Alto: Consulting Psychologists Press.
Petz, B. (1987). Psihologija rada. Zagreb: Školska knjiga.
Radna skupina za pripravu novog sustava VSSp. (2001). Minimalni zahtjevi tjelesnih,
psihičkih i zdravstvenih sposobnosti za ulazak u specijalnost. Zagreb: Ministarstvo
obrane.
Radna skupina za pripravu novog sustava VSSp i Odjel za vojnu psihologiju OJI MORH.
(2000). Naputak za utvrđivanje minimalnih zdravstvenih, tjelesnih i psihičkih zahtjeva
za određivanje VSSP. Zagreb: Načelnik GS OS RH.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
310

USING DECISION TREE METHODOLOGY TO PREDICT ATTRITION


WITH THE AIM

Wayne C. Lee
Department of Psychology
University of Illinois at Urbana-Champaign
603 East Daniel Street
Champaign, IL 61820
wlee@s.psych.uiuc.edu

Dr. Fritz Drasgow


Department of Psychology
University of Illinois at Urbana-Champaign
603 East Daniel Street
Champaign, IL 61820
fdrsagow@s.psych.uiuc.edu

This paper describes the Assessment of Individual Motivation (AIM) and efforts to use it
for predicting first-term attrition in the United States Army. This description provides a context
for the three papers presented in this symposium, including this one where the results of one
investigation are presented. In this first investigation, a non-linear, “configural” approach to
prediction is applied to examine whether we can improve on linear methods used to determine
the predictive validity of the AIM with respect to attrition in a 12-month time interval.

The AIM
Attrition is among the most studied of the organizationally relevant outcomes among
personnel researchers. One estimate as early as 1980 put the number of articles and book
chapters devoted to attrition between 1500 and 2000 (Muchinsky & Morrow, 1980). Certainly,
the popularity of this topic is due in part to the high cost associated with attrition in
organizations. Earlier research conducted by the U.S. Army Research Institute for the
Behavioral and Social Sciences (ARI) with the Assessment of Background and Life Experiences
(ABLE) suggested that temperament measures might indeed be good predictors of attrition in the
U.S. Army. Unfortunately, as with many temperament measures, concerns regarding the
potential effects of faking and coaching restricted the implementation of the ABLE in new
recruit selection (White, Young, & Rumsey, 2001).
Beginning in the mid-1990’s, this line of research was continued formally by ARI with
the development of the AIM. This new measure was designed specifically to target first-term
attrition while also being less susceptible to faking and coaching. The AIM is comprised of 27
items in a forced-choice format measuring 6 constructs. While “27 items” may seem like a small
number of items for any measure of multiple constructs, it is important to understand that each
item is comprised of four descriptive stems. Each of these stems –108 in all—could very easily
be presented as a single-item in Likert-type format. For each item in the AIM, two stems are
worded negatively and two are worded positively –representing low and high levels of a
particular construct, respectively, if a respondent endorses them.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
311

For each of the 27 items, the respondent is asked to indicate which of the four stems is
“Most like me” and which stem is “Least like me.” This item format results in four, quasi-
ipsative measurement opportunities, with each stem receiving a particular score. Each item is
constructed such that each of the four component-stems measure separate constructs. One of the
primary reasons behind this item format and scoring scheme is to provide a measure that is less
transparent, and thus less susceptible to faking and coaching. Research with this measure
suggests that the goals of the AIM (predicting first-term attrition and being resistant to faking
and coaching) are indeed met (White & Young, 1998).
With this evidence, the AIM has been in use operationally since early 2000 with a 3-year
pilot program for non-high school diploma graduate recruits. These candidates are tested with
the AIM, and, if they do not have their General Education Development (GED, high-school
equivalency) certificate, are sponsored to complete a GED program and are then processed under
the U.S. Army’s Delayed Entry Program (DEP). This pilot program allows additional recruits,
from a labor market which otherwise may be inaccessible, to enlist in the U.S. Army, while at
the same time screens out potential recruits likely to leave the military soon after entry.
This screening of potential recruits is, in part, based on cutoffs associated with an “AIM
Adaptability Composite Score,” comprised of a portion of the items across the 6 content scales.
Unfortunately, we cannot describe or discuss the six content scales or the Adaptability
Composite any further without compromising the AIM and/or its usage. As such, we will refer
to the six content scales as Scales A through F.
While the evidence from the previously mentioned research is enough to justify the use of
the AIM in selection, such research based on traditional statistical approaches (e.g., development
of the Adaptability Composite) may suffer potentially limiting characteristics. For example, with
approaches such as linear and logistic regression, the terms of the equation, or weights used to
determine a composite score based on these frameworks, act similarly across cases (i.e.,
respondents). Only a small handful of prediction or classification approaches are sensitive to the
profile of scale scores associated with each case–that is, approaches that may delineate separate
mechanisms that lead to the same or similar outcomes. One such non-linear, “configural”
method can be found with decision tree methodology.

Classification and regression trees


Decision tree methods can be used to predict any number of outcomes based on a set of
predictor variables. The foundation of these methods lies in decision rules using predictor
variables, arranged hierarchically, that split the data into smaller and smaller groups of
increasing homogeneity with respect to some outcome. The hierarchical arrangement of splitting
rules can be depicted graphically with an inverted tree, where:
• The “root” of the tree represents an initial split (first node) of the entire dataset based
on a cutoff score associated with one variable
• “Branches” (internal nodes) depict additional splitting rules that further define the
underlying relationship between the variables and the criterion by increasing the
homogeneity of resulting nodes
• “Leaves” (terminal nodes) represent predicted outcomes or levels of a criterion
variable (e.g., those who stay with the U.S. Army and those who do not)

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
312

Classification and regression trees (CART; Breiman, Friedman, Olshen, & Stone, 1984)
is one algorithm associated with this approach. Through brute force, CART examines all
possible binary splits of the data (answers to “yes/no” questions) based on all of the predictor
variables. It places the best split at the root of the tree and continues this process until no further
splits can be made. Later, the resulting decision tree is “pruned” according to misclassification
rates, user-determined preferences (e.g., permitted number of cases in a terminal node), or by
eliminating redundant nodes. Additionally, competing trees may develop depending on the
nature of the data.
To assess classification accuracy, CART uses “v-fold cross-validation” in which the
sample is divided into v subsamples and grows a decision tree after combining v-1 subsamples
and assesses the classification accuracy using the hold-out subsample. This process is iterated so
that v-1 subsamples are combined and used to grow decision trees and each of the v samples is
used as the hold-out sample once. Classification accuracy is estimated as the average
classification accuracy across the v holdout subsamples.
As mentioned earlier, CART may delineate separate “paths” that lead to the same
outcome –identifying configural relationships in the data. Also, CART may reuse any number of
variables in separate parts of the tree, and thus, may capture non-linear relationships. Below we
describe a investigation that examined whether we could improve upon the Adaptability
Composite in predicting attrition with the AIM.

Applying Decision Tree Methodology to AIM Data

Sample, data, and software


A file containing the data from 22,328 enlisted U.S. Army personnel was created
containing the AIM scale scores and a retention variable to 12 months. The data for this file
came from the AIM Grand Research Database, managed by ARI and a contractor, the Human
Resources Research Organization (HumRRO). This database is the source of much of the recent
research surrounding the AIM and, for this database, the AIM was administered to these
personnel between 1998 and 1999 for research purposes only (Knapp, Heggestad, & Young, in
preparation). The 12-month time interval was selected because this provided the CART 4.0
software package (Breiman, Friedman, Olshen, & Stone, 1997) with a sufficient number of
respondents with which to “grow” a tree. The six content scales were used as input (predictor
variables) and the 12-month attrition variable (criterion) was treated dichotomously (i.e.,
“stayers” and “leavers”).

Results
The analysis yielded 39 trees ranging in complexity from two terminal nodes to a tree
with 2802 terminal nodes and a depth of 51 levels or tiers. However, the larger trees exhibited
high rates of misclassification among the stayers (as much as 60 percent). Of particular interest
were five trees resulting from this analysis. Table 1 summarizes the misclassification rates for
these five “best” trees using the v-fold cross-validation approach (v=10).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
313

Table 1: Misclassification rates for five classification trees

“False positives” “Hits”


Number of (misclassification of (correct classification of
terminal nodes stayers, percentages) leavers, percentages)
3 31.14% 45.32%
6 34.23% 48.19%
7 33.64% 47.40%
11 33.40% 47.13%
18 32.09% 45.68%

The first and third of these trees are depicted in Figures 1 and 2, respectively (where left
branches indicate a “yes” response, while right branches indicate a “no” response to the decision
rule in the parent node). For example, Figure 1 shows that the root (i.e., initial) node splits the
sample on the basis of Scale D scores; individuals with scores on Scale D of less than 8.5 are
predicted to be leavers and individuals with scores greater than 8.5 are branched to another node.
In this internal node, individuals with relatively high scores on Scale D (i.e., greater than 8.5) but
low scores on Scale B (less than 14.89) are predicted to leave. It is only individuals with high
scores on both Scales D and B that are predicted to be stayers.

Figure 1: Classification tree with 3 terminal nodes

CART also rank-orders the relative importance of the predictor variables. In this
analysis, CART identified the two best predictors as Scale D and B. Scales A, E, and C played a
smaller role in these classification trees, whereas Scale F played a nearly insignificant role. In
comparing the performance of these trees to the Adaptability Composite, we turn to a receiver
operating characteristic (ROC; Figure 3) curve depicting the relative hit and false positive rates
associated with separate cut-off scores for the Adaptability Composite and the classification tree
with 7 terminal nodes. A similar pattern was found in comparing this tree against results from
logistic regression, where all six of the content scales served as predictors.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
314

Figure 2: Classification tree with 7 terminal nodes

Discussion and Conclusions

One goal of this investigation was to determine whether decision tree methodology,
specifically the CART algorithm, could improve upon the prediction of attrition based on the
Adaptability Composite and logistic regression. These results, particularly the ROC curve in
Figure 3, suggest that CART can indeed produce a selection algorithm that outperforms the
Adaptability Composite and logistic regression. However, a number of caveats should be noted.
First, CART produces trees that are “discrete” in their rates of hits and false positives (e.g., the
single point on the ROC associated with the 7-terminal-node tree). In practice, it may be
preferable to set the cut-off score based on a set false-positive rate, which may not be available
from the CART output. Second, implementing the decision scheme associated with any decision
tree may be computationally complex, depending on the depth and width of a tree, as a number
of nested IF statements would have to be used. Third, in this case, the difference in performance
of these two approaches is not that large –a few percentage points with respect to the hit and
false positive rates. Whether this is a meaningful or significant difference is better determined

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
315

by examining all of the possible costs and benefits associated with choosing one approach over
the other. This might include factors such as economic conditions, recruitment and staffing
goals, or even changes within the labor market.
Finally, in examining Figure 2 we do indeed see evidence of non-linear, configural
relationships in the data (i.e., different cut-scores associated with the same variable and the same
outcome described by separate paths within the tree). This characteristic of decision tree
methodology has proven to be of tremendous use in fields as diverse as biology, mechanical
engineering and finance (Breiman et al., 1997). In addition to improving prediction, decision
tree methodology may also prove to be a valuable tool in developing theories for many content
domains within personnel research and organizational science.

Figure 3: ROC curve depicting the Adaptability Composite and one CART tree

Acknowledgements

We wish to thank the U.S. Army Research Institute for access to the AIM data and for
supporting this research. Assistance from the Human Resources Research Organization
(HumRRO) was particularly helpful with data management and recordkeeping. The Consortium
of Universities of the Washington Metropolitan Area was also helpful in securing research funds.
All statements expressed in this document are those of the authors and do not necessarily reflect

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
316

the official opinions or policies of the U.S. Army Research Institute, the U.S. Army, the
Department of Defense, HumRRO, or the Consortium of Universities.

References

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression
trees. Pacific Grove: Wadsworth.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1997). CART (Version 4.0)
[Computer program & documentation]. San Diego, CA: Salford Systems.
Knapp, D.J., Heggestad, E.D., & Young, M.C. (Eds.). (In preparation). Understanding
and improving the Assessment of Individual Motivation (AIM) in the Army's GED Plus Program
(ARI Study Note). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social
Sciences.
Muchinsky, P.M., & Morrow, P.C. (1980). A multidisciplinary model of voluntary
employee turnover. Journal of Vocational Behavior, 17, 263-290.
White, L.A. & Young, M.C. (1998, August). Development and validation of the
Assessment of Individual motivation (AIM). Paper presented at the Annual Meeting of the
American Psychological Association, San Francisco.
White, L.A., Young, M.C., & Rumsey, M.G. (2001). ABLE implementation issues and
related research. In J.P. Campbell & D.J. Knapp (Eds.) Exploring the limits in personnel
selection and classification (pp. 525-558). Mahwah, NJ: Erlbaum.
Young, M.C., Heggestad, E.D., Rumsey, M.G., & White, L.A. (2000, August). Army
pre-implementation research findings on the Assessment of Individual Motivation (AIM). Paper
presented at the Annual Meeting of the American Psychological Association, San Francisco.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
317

PREDICTING ATTRITION OF ARMY RECRUITS USING OPTIMAL


APPROPRIATENESS MEASUREMENT

Dr. Oleksandr S. Chernyshenko


Department of Psychology
University of Canterbury
Private Bag 4800
Christchurch, New Zealand
sasha.chernyshenko@canterbury.ac.nz

Dr. Stephen E. Stark


Department of Psychology
University of South Florida
4202 E. Fowler Ave.
Tampa, FL 33620
sstark@cas.usf.edu

Dr. Fritz Drasgow


Department of Psychology
University of Illinois at Urbana-Champaign
603 E. Daniel St.
Champaign, IL 61820
fdrasgow@s.psych.uiuc.edu

The purpose of this research was to determine if item response theory (IRT) optimal
appropriateness measurement methods could improve the prediction of attrition for the six
content scales of the AIM. Optimal appropriateness measurement (OAM) provides
statistically most powerful methods for classifying examinees into two groups, such as
“stayers” and “leavers.” If the item response model is correctly specified for each studied
group, then the Neyman-Pearson lemma states that no other method can be used on the
same data to provide more accurate classification. Thus, the procedures are said to be
optimal (Levine & Drasgow, 1988).
Our application of OAM methodology for predicting attrition involved a three-step
process: 1) calibration of AIM scales with an appropriate IRT model, 2) examination of
model-data fit, and 3) classification via optimal appropriateness measurement. A detailed
description of each of these steps is presented below.

Sample

We used the 22,666 active Army cases contained in the Army AIM Grand Research
database. The subjects were enlisted applicants for the Army’s GED Plus Program who
completed the AIM during the 2000-2002 period. The majority (92%) had applied to join
the Regular Army, while the remainder applied for the Army Reserves. At the time of
application, 92% had a GED certificate, while the remainder had neither a high school
diploma nor an alternative high school credential. For every subject, retention status was
available at 12 month since enlistment.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
318

Calibration of the AIM Content Scales


Unidimensionality. All analyses were conducted using the AIM trichotomous item
scoring (2, 1, and 0). The majority of IRT models that may be suitable to describe
trichotomous AIM scoring require both that the data are essentially unidimensional and, in
connection, the item responses are locally independent. Both assumptions are satisfied
when factor analysis of the data reveals the presence of one dominant dimension (e.g.,
Hulin, Drasgow, & Parsons, 1983).
To investigate dimensionality, classical test statistics were first computed for all of
the stems of the six AIM content scales: Physical Conditioning, Leadership, Work
Orientation, Adjustment, Agreeableness, and Dependability. Two stems that had a negative
corrected item-total correlation (the first stem in both the Agreeableness and Dependability
scales), so they were removed from the respective scales and the statistics were re-
computed. The resulting coefficient alphas varied between .57 and .70.
Next, factor analyses were carried out on the stems of each of the six content scales
separately. The results indicated the presence of a relatively strong dominant factor for
each scale. In examining the eigenvalues, a sharp elbow was apparent in each case (see
Fig. 1 below). Consequently, a unidimensional IRT model may be suitable to describe
AIM’s responding at the stem level.

Figure 1. Scree plot following factor analysis for the six content scales

3 Physical Conditioning
Leadership
Work Orientation
2
Adjustment
Agreeableness
1 Dependability

0
1 3 5 7 9 11 13 15 17 19

Description of SGR model. Because a single dominant dimension was found to


underlie each of the six AIM content scales and the response data were scored
polytomously with options arranged in an increasing order (i.e., 0, 1, 2), Samejima’s
Graded Response (SGR) model was selected for item parameter estimation. For the SGR
model, the probability of endorsing a response option, or category, depends on the
discriminating power of the item and the location of the threshold parameter for that option
on the latent trait (theta) continuum. The mathematical form of the SGR model is

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
319

1 1
P( v i = jθ = t) = − ,
1 + exp[−1.7a i (t − b i, j )] 1 + exp[−1.7a i (t − b i, j+1 )]
where vi denotes person i’s response to the polytomously scored item; j is the particular
option selected by the respondent (j = 1,…, J, where J refers to the number of options for
item i); ai is the item discrimination parameter and is assumed to be the same for each
option within a particular item; b is the extremity parameter that varies from option to
option given the constraints bj-1< bj< bj+1, and bJ is taken as + ∞.
For stems having three options, as in the AIM scales, three parameters are estimated
for each stem: one discrimination parameter that reflects the steepness of the option
response function (ORF) and two location parameters that reflect the positions of the ORFs
along the horizontal axis.
Item parameter estimation. Item parameters for the SGR model were estimated
separately for the total samples of stayers (N = 18016) and leavers (N = 4521) using the
MULTILOG computer program (Thissen, 1991). Space limitations prohibit the
presentation of the resulting item parameters for the six AIM content scales.

Examining Model-Data Fit


Graphical and statistical methods were used to examine the fit of the SGR model to
AIM content scale data for both stayers and leavers. This required that the total samples be
split into calibration and validation subsamples. Item parameters were reestimated for the
calibration subsamples using MULTILOG. The validation subsamples were used for
computing empirical response functions and chi-square fit statistics.
Fit plots and chi-square statistics were computed using the MODFIT computer
program (Stark, 2001). (See Drasgow, Levine, Tsien, Williams, and Mead [1995] for a
detailed description of the methods.). “Fit plots” provide a graphical method of evaluating
model-data fit. In this method, a theoretical option response function computed using the
parameters estimated from the calibration subsample is compared to the empirical response
function computed using the cross-validation sample. A close correspondence between two
functions indicates good fit.
One fit plot was produced for each response option. In each plot, there was a close
correspondence between the theoretical and empirical response functions, which suggests
that the SGR model fit the data well.
Model-data fit was also examined using chi-square statistics. These statistics were
computed from the expected and observed frequencies for each individual stem (one-way
chi-square table) and for combinations of pairs and triples of stems (two-way and three-way
tables). The later were computed to detect violations of local independence and forms of
misfit that are often missed by item singles. The chi-squares were also adjusted for a
sample size of 3,000 and divided by their degrees of freedom to facilitate comparisons
across samples of different sizes. According to Drasgow et al. (1995), adjusted chi-square
to degrees of freedom ratios of 3 or less indicate good model-data fit.
The results indicated that relatively small χ2/df statistics for stem singles, doubles
and triples were obtained for all AIM content scales. The average adjusted χ2/df for single
items ranged from 0.6 to 2.2; the average for doublets ranged from 2.4 to 3.7; for triplets
the range was from 2.4 to 3.3. These results, in conjunction with the fit plots, indicate that
the SGR model fit the AIM data well and could be used for classification of respondents
based on OAM methods.
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
320

Classification Via Optimal Appropriateness Measurement


Optimal appropriateness measurement (OAM) was used to classify respondents into
groups of “stayers” or “leavers” based on the value of their likelihood ratio statistic.
Specifically, the likelihood ratio statistic for each respondent was computed by dividing the
marginal likelihood for being a leaver by the marginal likelihood of being a stayer. In this
situation, we assumed that the same process underlies responding for stayers and leavers, so
the same marginal likelihood equation can be used for both groups. The only difference
lies in the estimated item parameters used in the marginal likelihood equation shown below
n J
Pr ob(ν*) = ∫ { ∏∑ δ (ν *)P(ν
j i i = j | t )}f ( t ) dt .
i =1 j=1

In the equation above, n is the number of items in an AIM scale, t is an individual’s


standing on the latent trait, J is the number of response options for an item i; δj(vi*) = 1 if
option was endorsed, and 0 otherwise; P (vi = j| t) is probability of choosing option j given t
(computed using the parameters for either stayers or leavers); and f (t) is the normal
density.
As an example of the OAM procedure, consider the following. For responses to,
say, the Physical Conditioning scale, first, compute the marginal probability of a
respondent’s Physical Conditioning responses using the SGR item parameters for leavers.
Second, compute the probability of the responses using the parameters for stayers. Third,
compute the ratio of these two probabilities. Finally, if the ratio is large (i.e., the responses
are better described by the model for leavers), predict that the respondent will be a leaver;
otherwise, predict that the respondent will be a stayer.
Six likelihood ratio statistics were computed for each respondent (one per AIM
content scale) using Stark’s OAM computer program (Stark, 2000). Once all the likelihood
ratios were obtained, logistic regression was used to determine the best linearly weighted
sum of LR values for predicting the dichotomous stayer/leaver outcome. Receiver
operating characteristic (ROC) curves were then generated for each AIM content scale and
the logistic regression composite to examine how well the OAM procedure differentiated
between groups of stayers and leavers. Fig. 2 presents an example ROC curve for one of
the AIM scales.
Figure 2. ROC based on Likelihood Ratio Values for an AIM scale

ROC for an AIM Content Scale

100%

80%

60%
Hits

40%

20%

0%
0% 20% 40% 60% 80% 100%
False Positives
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
321

It can be seen that for this AIM scale, the OAM procedure differentiated stayers and leavers
to a moderate degree. For example, for this scale, at a 20% false positive rate, 33% of
leavers were correctly identified. Note that because the AIM is currently used operationally
to predict attrition, we do not present results that can identify which AIM content scales
worked best.
The results also indicated that a LR composite provided the highest hit rates among
the seven decision variables. It correctly identified 22% of stayers at a 10% false positive
rate, 35% of stayers at a 20% false positive rate, 47% at 30%, 56% at 40% and 65% at
50%. The success of the LR composite indicated that AIM content scales provided
incremental validity in the prediction of attrition and, thus, should be used collectively.
It is important to note that the use of OAM methodology provided an improvement
over the current application of the Adaptability score in predicting attrition. For instance, at
about at 20 percent false positive rate, the current adaptability score yields 27 percent
correct identification rate of those who leave the service, while the OAM composite yields
a 33 percent correct identification rate. A graphical comparison of ROC curves for the two
identification procedures (see Fig. 3), showed that OAM method performed better than
Adaptability score at every level of the false positive rate. Thus, based on these results, we
recommend using the OAM-based LR statistic to predict the likelihood of attrition with
AIM instead of the Adaptability composite.
Fig. 3. ROCs for OAM and Adaptability Composites

ROCs for OAM and Adaptability Composites

100%

80%

60%
Hits

40%

20%

0%
0% 20% 40% 60% 80% 100%
False Positives

Ref Adapt OAM Composite


45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
322

Acknowledgements

We wish to thank the U.S. Army Research Institute for access to the AIM data and for
supporting this research. Assistance from the Human Resources Research Organization
(HumRRO) was particularly helpful with data management and recordkeeping. We also
thank the Consortium of Universities of the Washington Metropolitan Are. All statements
expressed in this document are those of the authors and do not necessarily reflect the
official opinions or policies of the U.S. Army Research Institute, the U.S. Army, the
Department of Defense, HumRRO, or the Consortium of Universities.

References

Drasgow, F., Levine M.V., Tsien, S., Williams B.A., & Mead, A.D. (1995). Fitting
polytomous item response theory models to multiple-choice tests. Applied Psychological
Measurement, 19, 143-165.
Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory:
Applications to psychological measurement. Homewood, IL: Dow Jones Irwin.
Levine, M. V., & Drasgow, F. (1988). Optimal appropriateness measurement.
Psychometrika, 53, 161 – 176.
Stark, S. (2001). MODFIT: Computer program for examining model-data fit using
fit plots and chi-square statistics. University of Illinois at Urbana-Champaign.
Stark, S. (2001). OAM_SGR: Computer program for optimal appropriateness
measurement. University of Illinois at Urbana-Champaign.
Thissen, D. (1991). MULTILOG user’s guide (Version 6.0). Mooresville, IN:
Scientific Software.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
323

A NEW APPROACH TO CONSTRUCTING AND SCORING FAKE-


RESISTANT PERSONALITY MEASURES
Dr. Stephen E. Stark
Department of Psychology
University of South Florida
4202 E. Fowler Ave.
Tampa, FL 33620
sstark@cas.usf.edu

Dr. Oleksandr S. Chernyshenko


Department of Psychology
University of Canterbury
Private Bag 4800
Christchurch, New Zealand
sasha.chernyshenko@canterbury.ac.nz

Dr. Fritz Drasgow


Department of Psychology
University of Illinois at Urbana-Champaign
603 E. Daniel St.
Champaign, IL 61820
fdrasgow@s.psych.uiuc.edu

Because of concerns about faking, defined as intentional response distortion, many


researchers have begun exploring methods for constructing and scoring personality tests that are
fake-resistant. At the forefront of this effort are Army researchers who developed the
Assessment of Individual Motivation (AIM; White & Young, 1998) inventory, which assesses
the temperament of Army recruits. The AIM is composed of items involving tetrads of
statements that are similar in social desirability, but representing different dimensions. A
respondent’s task is to choose the statement in each tetrad that is “most like me” and “least like
me.” Preliminary examinations of AIM data, collected under research conditions where scores
were not being used operationally, suggest that this multidimensional format for administering
items reduces score inflation due to faking to as little as one tenth of a standard deviation (White
& Young, 1998), as compared to the differences of 1 SD that have been observed with
traditional, single stimulus (statement) items (see White, Nord, Mael, & Young, 1993).
In this paper, we build on the idea of the AIM to address the general problem of faking in
personality assessment. Specifically, we propose a new item response theory (IRT) approach to
constructing and scoring multidimensional personality tests that are, in principle, fake-resistant.
Rather than focusing on tetrads, however, we create fake-resistant pairwise preference items by
pairing similarly desirable statements representing different dimensions. Using a simulation
study, we show that by scaling stimuli (individual statements) and persons in separate steps,
using different IRT models, it is possible to recover known latent trait scores, representing

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
324

different personality dimensions, with a high degree of accuracy, meaning that interindividual
comparisons are possible.

An IRT Approach to Constructing and Scoring Pairwise Preference Items


In his dissertation, Stark (2002) proposed a general IRT approach for constructing and
scoring pairwise preference items involving statements on different dimensions. A multi-step
procedure is required. 1) Develop a large number of statements representing different
personality dimensions. 2) Administer the statements to a group of respondents instructed to
indicate how well on, say, a scale of 1 to 5, each statement describes him/her. Also administer
the statements to a separate group of judges instructed to rate the desirability of each statement
using a similar scale. 3) Estimate stimulus parameters for the individual statements representing
each dimension separately, using a unidimensional IRT model that provides good model-data fit;
one possibility would be the Generalized Graded Unfolding Model (GGUM; Roberts, Donoghue,
& Laughlin, 2000a). 4) Create fake-resistant items by pairing statements similar in desirability
but representing different dimensions; also create a small proportion of unidimensional items by
pairing statements that are similar in desirability, but having different stimulus location
parameters. These pairings constitute the fake-resistant test. 5) Administer the resulting test to
respondents, instructed to choose the statement in each pair that better describes him/her. 6)
Score the pairwise preference data using a Bayes modal latent trait estimation procedure, based
on the following general model:

Pst {1, 0} Ps {1}Pt {0}


P( s >t )i (θd s , θdt ) = ≈ , (1)
Pst {1, 0} + Pst {0,1} Ps {1}Pt {0} + Ps {0}Pt {1}

where:
i = index for items (pairings), where i = 1 to I,
d = index for dimensions, where d = 1, …, D,
s, t = indices for first and second stimuli, respectively, in a pairing,
θd s , θd t = latent trait values for a respondent on dimensions d s and d t respectively,
Ps {1}, Ps {0} = probability of endorsing/not endorsing stimulus s at θd s ,
Pt {1}, Pt {0} = probability of endorsing/not endorsing stimulus t at θd t ,
Pst {1,0} = joint probability of endorsing stimulus s, and not endorsing stimulus t at ( θd s , θd t ) ,
Pst {0,1} = joint probability of not endorsing stimulus s, and endorsing stimulus t at ( θd s , θd t ) , and
P( s > t ) i ( θd s , θd t ) = probability of respondent j preferring stimulus s to stimulus t in pairing i.

In essence, the model above assumes that when a respondent is presented with a pair of
statements (stimuli), s and t, and is asked to indicate a preference, he/she evaluates each stimulus
separately and makes independent decisions about endorsement. If a respondent endorses both
stimuli, or does not endorse either, he/she must reevaluate the stimuli, independently, until a
preference is reached. A preference is represented by the joint outcome {Agree (1), Disagree (0)}
or {Disagree (0), Agree (1)}. An outcome of {1,0} indicates that stimulus s was preferred to

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
325

stimulus t, and is considered a positive response; an outcome of {0,1} indicates that stimulus t
was preferred to s (a negative response). Thus, the response data for this model are
dichotomous. Note that this model makes no assumption about item dimensionality. The
statements involved in a pair may be on the same or different dimensions and, in fact, a small
number of unidimensional pairings is required to identify the latent trait metric and permit
interindividual comparisons. In addition, because the stimuli in each pair are assumed to be
evaluated independently, stimulus parameters can be estimated, for each dimension separately,
by using software for calibrating unidimensional single stimulus responses, such as the
GGUM2000 computer program (Roberts, Donoghue, & Laughlin, 2000b). Therefore, this model
is not referred to as a multidimensional model, but rather a multi-unidimensional model called
MUPP (Multi-Unidimensional Pairwise Preferences; Stark, 2002).
Scoring respondents. Once the fake-resistant tests have been administered and the
dichotomous pairwise preference data have been collected, a multidimensional Bayes modal
estimation procedure can be used to obtain scores for each respondent on each dimension. This
amounts to maximizing the following equation:

⎧⎪ n ui
1− ui ⎫

L(u%, θ) = ⎨∏ ⎡⎣ P( s>t )i ⎤⎦ ⎡⎣1 − P( s >t )i ⎤⎦ ⎬ * f (θ%) ,
% (2)
⎩⎪ i =1 ⎭⎪

where θ%= ( θd '=1 , θd '=2 , ..., θd '= D ) represents a vector of latent trait values (one for each
dimension), u%represents a dichotomous response pattern, P( s >t )i ( θd s , θdt ) is the probability of
preferring stimulus s to stimulus t in item i, and f ( θ%) represents the prior density, whose
dimensions, d ' = 1 to D, are assumed uncorrelated.
Equation 2 can be solved numerically to obtain a vector of latent trait estimates for each
respondent using subroutine DFPMIN (Press, Flannery, Teukolsky, & Vetterling, 1990) in
conjunction with functions that compute the log likelihood and its first derivatives. DFPMIN
performs a D-dimensional minimization, using a Broyden-Fletcher-Goldfarb-Shanno (BFGS)
algorithm, so the first derivatives and log likelihood values must be multiplied by –1 when
maximizing the likelihood of a response pattern. The primary advantage of this approach, over
Newton-Raphson iterations, is DFPMIN does not require an analytical solution for the second
derivatives of the log likelihood. Instead, it provides an approximation to the inverse Hessian
matrix of second derivatives, from which standard errors of the latent trait estimates can be
obtained by taking the square roots of the diagonal elements

A Monte Carlo Study to Examine MUPP Scoring Accuracy

Constructing Tests for Simulations


To examine the accuracy of latent trait estimation, one- and two- dimensional tests were
constructed using AIM pretest data provided by the U.S. Army Research Institute (ARI) through
Human Resources Research Organization (HumRRO). Specifically, in the early stages of AIM
development, nearly 500 stimuli, representing 6 temperament dimensions, were administered to
738 recruits who were instructed to indicate their level of agreement, using a scale of 1 (very

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
326

untrue of me) to 6 (very true of me). Of those 738 recruits, 469 were instructed to answer
honestly, and 269 were told to fake good (i.e., answer in a manner that would improve their
score). HumRRO researchers screened those data and flagged six unusual response patterns,
which we excluded from the following analyses.
Based on 465 honest respondents, stimulus parameters were estimated for each of the
six AIM dimensions separately using the GGUM2000 computer program. For all dimensions
except Leadership, the parameter estimation procedure converged after eliminating just a few
stimuli and, overall, good-model data fit was observed. Next, a social desirability rating was
obtained for each statement by computing the mean proportion endorsement score using the
responses of the 267 persons in the fake good condition. Based on the similarity of the
distributions of stimulus parameters and social desirability ratings, the Adjustment and
Agreeableness dimensions were chosen for test construction.
Constructing 1-D tests. To investigate the accuracy of MUPP latent trait estimation as a
function of test length, three conventional tests of 10, 20, and 40 items were created. An effort
was made to construct tests having equal proportions of items that discriminated well at high,
moderate, and low values of theta. However, because few stimuli had large, and very few had
moderate location parameters, it was necessary to repeat some stimuli several times to create
items that provided information above theta equals 1. Once a final set of 40 items was selected,
the items were ordered to balance extremity and discriminating power across subsets 1 – 10, 11 –
20, and 21 – 40. Items 1 – 10 and 1 – 20 were used for the conventional tests of 10 and 20 items,
respectively; the entire set was used for the 40-item test.
Constructing 2-D tests. The accuracy of MUPP latent trait estimation for multi-
unidimensional tests is most likely influenced by two factors: 1) the number of items involving
each dimension (i.e., test length in the two-dimensional case); and 2) the percentage of items
involving stimuli on the same dimension. (A small proportion of unidimensional pairings,
representing each dimension, are required to identify the metric.). Therefore, these factors were
chosen as independent variables, having 3 levels each.
To implement a fully crossed factorial design, nine conventional tests were constructed
according to the design specifications shown below.

Percent of Items Involving Stimuli on Same Dimension


10% 20% 40%
Total Test 1-2 1-2 1-2
Length 1-1 2-2 2-1 1-1 2-2 2-1 1-1 2-2 2-1
20 1 1 18 2 2 16 4 4 12
40 2 2 36 4 4 32 8 8 24
80 4 4 72 8 8 64 16 16 48

In the table, the bold entries in the first row and column represent the levels of the independent
variables. For example, Column 1 indicates total test length, ranging from 20 to 80. Row 1
indicates the percent of items involving unidimensional pairings, ranging from 10% to 40%. Just
below, the entries in the columns labeled 1-1, 2-2, and 1-2/2-1 represent the required numbers of

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
327

items involving stimuli on dimension 1 (Adjustment) only, dimension 2 (Agreeableness) only,


and dimensions 1 and 2. For illustration, a conventional test of 80 items (pairings), 40% of
which are unidimensional, must contain 16 1-1, 16 2-2, and 48 1-2/2-1 items.
80 stimuli representing Adjustment (dimension “1”) and 49 stimuli representing
Agreeableness (dimension “2”) were chosen, as in the 1-D case. Multidimensional items (1-2
and 2-1) were created by pairing Adjustment and Agreeableness stimuli that had similar
desirability; unidimensional (1-1 and 2-2) items were created by pairing stimuli, representing the
same dimension, which had different location parameters, but fairly similar desirability ratings.
Once the three 80-item tests were constructed, three 20- and three 40-item tests were created
using the first 20 and 40 items, respectively, of the 80-item test in each condition.

Investigating Parameter Recovery


1-D simulations. To determine if the accuracy of parameter recovery varied across theta
points on the unidimensional trait continuum, each conventional test was administered to 50
simulated examinees (simulees) at 31 points on the interval [ −3.0, −2.8,..., +3.0] . At each grid
point, the average estimated theta and standard error were computed over 50 replications and
used to compute error statistics, which were compared across conditions using MANOVA.
2-D simulations. Each of the nine conventional tests was administered to 50 simulees at
points on a ( θ1 , θ2 ) grid, where θd ranged from –3 to +3 in increments of 0.5. As above, error
statistics for the estimated thetas and standard errors were compared using MANOVA.

Results
1-D simulations. Bias and root mean square errors of the latent trait estimates decreased
as test length increased, but accurate parameter recovery was observed across a wide range of
theta even for the short 10-item test. The estimated standard errors were also accurate,
approaching zero at moderate thetas for tests of 20 and 40 items. Overall, the results suggested
that latent trait and standard error estimation was quite accurate. In fact, a follow-up simulation,
examining the correlation between estimated and known thetas for 1000 simulees, sampled from
a standard normal distribution, showed correlations between estimated and known thetas for the
10-, 20-, and 40- item tests of .90, .95, and .97, respectively.
2-D simulations. Two independent variables, test length (TESTLEN) and the percent of
unidimensional pairings (UNIPCT) were fully crossed to produce 9 tests. Error statistics were
computed for each dimension separately and averaged for comparison using MANOVA. As
before, a follow-up simulation was conducted for each test by sampling known thetas for 1000
simulees from independent standard normal distributions and computing the correlations
between the estimated and known thetas.
As in the 1-D study, bias in the latent trait estimates decreased as test length increased,
and the largest bias statistics occurred at the endpoints of the trait continua, where the regression
toward the mean effect was greatest and the items provided the least information. In addition,
while the accuracy of latent trait estimation did increase as the percentage of unidimensional
pairings increased from 10% to 20%, there was little improvement by going from 20% to 40%.
These results were supported by the MANOVA, which showed main effects for both
independent variables, but only a weak linear trend for UNIPCT (eta-squared was about .10).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
328

This finding is important from a substantive perspective, because fewer unidimensional pairings
means a more fake-resistant test. As a final note, the correlations between the estimated and
known thetas, ranged from .77 in the most unfavorable situation (a 20-item test with 10%
unidimensional pairings) to .96 in the most favorable (80-item test with 40% unidimensional).
The average correlation was about 0.9 for the 40-item tests, regardless of the percentage of
unidimensional pairings.

Discussion and Conclusions

This paper outlines a method of constructing and scoring fake-resistant multidimensional


pairwise preference items. Individual statements are administered and calibrated using a
unidimensional single stimulus model. Social desirability ratings are obtained for statements
using, say, a panel of judges, and fake-resistant items are created by pairing similarly desirable
statements representing different dimensions. Tests are created by combining multidimensional
items with a small number of unidimensional pairings needed to identify the latent metric. Trait
scores are then obtained using a multidimensional Bayes modal estimation procedure based on
the MUPP model developed by Stark (2002).
As shown here, the MUPP approach to test construction and scoring provides accurate
parameter recovery in both one- and two-dimensional cases, even with relatively few (say 15%)
unidimensional pairings. Accuracy of this approach generally improves as a function of test
length. Even with nonadaptive tests, good estimates may be attained using only 20 to 30 items
per dimension, meaning that a 5-D test would require 100 to 150 items. If adaptive item
selection were used to improve efficiency, the required number of items might decrease by as
much as 40%. We are currently developing and validating a 5-D inventory, using this approach,
and comparing the scores to those obtained using traditional methods.

Acknowledgements

We wish to thank the U.S. Army Research Institute for access to the AIM data and for supporting
this research. Assistance from the Human Resources Research Organization (HumRRO) was
particularly helpful with data management and recordkeeping. The Consortium of Universities
of the Washington Metropolitan Area was also helpful in securing research funds. All statements
expressed in this document are those of the authors and do not necessarily reflect the official
opinions or policies of the U.S. Army Research Institute, the U.S. Army, the Department of
Defense, HumRRO, or the Consortium of Universities.

References

Press, W.H., Flannery, B.P., Teukolsky, S.A., & Vetterling, W.T. (1990). Numerical
recipes: The art of scientific computing. New York: Cambridge University Press.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
329

Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000a). A general item response
theory model for unfolding unidimensional polytomous responses. Applied Psychological
Measurement, 24, 3 – 32.
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000b). GGUM2000 [computer
program]. A Dos-based program for unfolding ideal point responses. Department of
Measurement and Statistics. University of Maryland.
Stark, S. (2002). A new IRT approach to test construction and scoring designed to reduce
the effects of faking in temperament assessment [Doctoral Dissertation]. University of Illinois at
Urbana-Champaign.
Stark, S., & Drasgow, F. (2002). An EM approach to parameter estimation for the
Zinnes and Griggs paired comparison ideal point IRT model. Applied Psychological
Measurement, 26, 208 – 227.
White, L. A., Nord, R. D., Mael, F. A., & Young, M. C. (1993). The Assessment of
Background and Life Experiences (ABLE). In T. Trent & J. H. Laurence (Eds.), Adaptability
screening for the armed forces (pp. 101 – 162). Washington, DC: Office of the Assistant
Secretary of Defense (Force Management and Personnel).
White, L. A., & Young, M. C. (1998). Development and validation of the Assessment of
Individual Motivation (AIM). Paper presented at the Annual Meeting of the American
Psychological Association, San Francisco, CA.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
330

U.S. NAVY SAILOR RETENTION: A PROPOSED MODEL OF


CONTINUATION BEHAVIOR13

Jessica B. Janega and Murrey G. Olmsted


NAVY PERSONNEL RESEARCH, STUDIES AND TECHNOLOGY
DEPARTMENT
jessica.janega@persnet.navy.mil

Sailor turnover reduces the effectiveness of the Navy. Turnover has improved
significantly since the late 1990s due to the implementation of a variety of retention
programs including selective re-enlistment bonuses, increased sea pay, changes to the
Basic Allowance for Housing (BAH) and other incentives. Now in many cases, the Navy
has adequate numbers of Sailors retained, however, it faces the problem of retaining the
best and brightest Sailors in active-duty service (Visser, 2001). Changes in employee
values will require that organizations, such as the Navy, make necessary changes in their
strategy to retain the most qualified personnel (Withers, 2001). Attention to quality of life
issues is one way in which the military has addressed the changing needs of its members
(Kerce, 1995). One of the most effective ways to assess quality of life in the workplace is
to look at the issue of job satisfaction. Job satisfaction represents the culmination of
feelings the Sailor has toward the Navy. Job satisfaction in combination with variables
like organizational commitment can be used to predict employee (i.e., Sailor) retention
(for a general overview see George & Jones, 2002). The purpose of this paper is to
explore the relationship of job satisfaction, organizational commitment, career intentions,
and continuation behavior in the U.S. Navy.

Job Satisfaction
According to Locke (1976), job satisfaction is predicted by satisfaction with
rewards, satisfaction with work, satisfaction with work context (or working conditions),
and satisfaction with other agents. Elements directly related to job satisfaction include
direct satisfaction with the job, action tendencies, career intentions, and organizational
commitment (Locke, 1976). Olmsted & Farmer (2002) replicated a version of Locke’s
(1976) model of job satisfaction proposed by Staples & Higgins (1998) by applying it to
a Navy sample. Staples and Higgins (1998) proposed that job satisfaction is both a factor
predicted by other factors, as well as an outcome in and of itself. Olmsted & Farmer
(2002) applied the model of Staples and Higgins (1998) directly to Navy data using the
Navy-wide Personnel Survey 2000. The paper evaluated two parallel models, which
provided equivalent results indicating that a similar version of Locke’s model could be
successfully applied to Navy personnel.

Organizational Commitment
Organizational commitment involves feelings and beliefs about entire
organizations (George & Jones, 2002). Typically, organizational commitment can be

13
The opinions expressed are those of the authors. They are not official and do not represent the views of
the U.S. Department of Navy.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
331

viewed as a combination of two to three components (Allen & Meyer, 1990). The
affective (or attitudinal) component of organizational commitment involves positive
emotional attachment to the organization, while continuance commitment is based on the
potential losses associated with leaving the organization, and normative commitment
involves a commitment to the organization based on a feeling of obligation (Allen &
Meyer, 1990). Commonalities across all affective, normative, and continuance forms of
commitment indicate that each component should affect employee’s intentions and final
decision to continue as a member of the organization (Jaros, 1997). The accuracy of these
proposed relationships have implications for turnover reduction because “turnover
intentions is the strongest, most direct precursor of turnover behavior, and mediates the
relationship between attitudes like job satisfaction and organizational commitment and
turnover behavior,” (Jaros, 1997, p. 321). This paper primarily addresses affective
commitment, since it has a significantly stronger correlation with turnover intentions than
either continuance or normative commitment (Jaros, 1997).

Career Intentions
Career intentions represent an individual’s intended course of action with respect
to continuation in their current employment. While a person’s intentions are not always
the same as their actual behavior, an important assumption is that these intentions
represent the basic motivational force or direction of the individual’s behavior (Jaros,
1997). In general, Jaros (1997) suggests that the combination of organizational
commitment and career intentions appears to be a good approximation of what is likely to
occur in future career behavioral decisions (i.e., to stay or leave the organization).

Purpose
This paper looks at job satisfaction, organizational commitment, career intentions,
and continuation behavior using structural equation modeling. It was hypothesized that
increased job satisfaction would be associated with increased organizational commitment,
which in turn would be positively related to career intentions and increased continuation
behavior (i.e., retention) in the Navy. A direct relationship was also hypothesized to exist
between career intentions and continuation behavior.

METHODS

Participants
The sample used in this study was drawn from a larger Navy quality of work life
study using the Navy-wide Personnel Survey (NPS) from the year 2000. The NPS 2000
was mailed to a stratified random sample of 20,000 active-duty officers and enlisted
Sailors in October 2000. A total of 6,111 useable surveys were returned to the Navy
Personnel Research, Studies, & Technology (NPRST) department of Navy Personnel
Command, a return rate of 33 percent. The current sample consists of a sub-sample of
700 Sailors who provided social security numbers for tracking purposes. Sailors whose
employee records contained a loss code 12 months after the survey were flagged as
having left the Navy (10.4%). Those Sailors who still remained in active-duty in the
Navy (i.e., those who could be tracked with social security number and did not have a
loss code in their records) were coded as still being present in the Navy (87.8%). Sailors

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
332

whose status was not clear from their employment records (i.e., those who could not be
tracked by social security number) were retained in the analysis with “unknown status”
(1.8%).

Materials
The NPS 2000 primarily focuses on issues related to work-life and career
development for active-duty personnel in the U.S. Navy. The survey contains 99
questions, many of which include sub-questions. Formats for most of the 99 questions
follow a five-point Likert-type scale.

Analysis Procedures
This sample contained missing data. Of those who returned the NPS 2000, not
every Sailor filled it out completely. For this reason, Amos 4.0 was chosen as the
statistical program to perform the structural equation models for this sample. Amos 4.0 is
better equipped to handle issues with missing data then most any other structural equation
modeling program (Byrne, 2001). Once acceptable factors were found via data reduction
with SPSS 10, the factors and observed variables were input into Amos 4.0 for structural
equation modeling via maximum likelihood estimation with an EM algorithm (Arbuckle
& Wothke, 1999).

RESULTS

Overall, the proposed model ran successfully and fit the data adequately. A
significant chi-square test was obtained for the model, indicating more variance remains
to be accounted for in the factor, χ2(938) = 7637.94, p<.001. However, according to
Byrne (2001), the chi-square test is now largely regarded as sample size dependent. The
Normed fit index (NFI) and the comparative fit index (CFI) were estimated as fit indices
(Byrne, 2001). For adequate fit, the NFI and CFI should be greater than .90 (Bentler &
Bonnet, 1980). By this criterion, the initial model was adequate, with a NFI of .90 and a
CFI of .91. Finally, the root mean square error of approximation (RMSEA) was also used
as a fit index for the initial model. An RMSEA of .10 here indicated borderline fit
(Browne & Cudeck, 1993). A representation of the model is presented below in Figure 1.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
333

e29 e30 e31 e32


e1 Q94E
e2 Q52A Q62F Q62G Q62U Q62N
e3 Q53K .67 .78
.36 .68
e4 Q53I .32 .41
.62
e5 Q53J .71 Satisfaction
.85 with Working
e6 Q53G Conditions
.64
e7 Q53F .79
.55 Satisfaction
e8 Q53E .56 with Rewards
.45 e43 e44
e9 Q53D .41 e45
.18
e10 Q62M .40
.45 -.02
.41 Q47B Q47C Status
e11 Q62L e46
e12 Q73F .81 .78 .08
e13 Q52G e33 Q60H .79
.88 Global Job Career .83 Continuation
e14 Q52F e34 Q60I Intentions Behavior
Satisfaction
e35 Q52H .73

.80 e48
e15 Q54E .58
e16 Q53B .43
.73
e17 Q62D Satisfaction
with Work .70 .35 .34
e18 Q60F .50
e19 Q62I .73 .24
e20 Q53H .36

e47

e21 Q62V Organizational


Committment
e22 Q52Q .71 .66 .73 .90
.64 .82 .80 .87
.62
e23 Q65D .87
Q50A Q50B Q50D Q50E Q50F Q50G Q50H
.47 Satisfaction
e24 Q64D
.52 with Other
e25 Q64B .89 Agents
e36 e37 e38 e39 e40 e41 e42
.50
e26 Q65B .94
e27 Q64E
e28 Q65E

Figure 1. Exploratory Model

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
334

DISCUSSION

This model provides an adequate fit to Navy data for use in relating job
satisfaction and organizational commitment to career intentions and continuation
behavior. Advantages over previously tested models include the use of structural equation
modeling over regression path analysis, and the treatment of job satisfaction and
organizational commitment as separate factors. Several points of interest are apparent in
evaluating the results of the model. First, several factors and observed variables
contributed to global job satisfaction. Satisfaction with work predicted the most variance
in global job satisfaction of any of the factors (path weight = .80). Satisfaction with other
agents was the next largest predictor of global job satisfaction, followed by working
conditions and satisfaction with rewards. Interestingly, the amount of variance in global
job satisfaction predicted by satisfaction with rewards was very low (path weight = -.02).
This suggests that the rewards listed on this survey are not as important to job satisfaction
as being generally satisfied with the job itself, or else these rewards do not adequately
capture what Sailors value when considering satisfaction with their job. Perhaps these
results may also indicate the differences between intrinsic and extrinsic rewards as
predictors of job satisfaction. The relationships between variables relating to intrinsic and
extrinsic motivation should be explored further in this model as they pertain to job
satisfaction.

Job satisfaction as it is modeled here is a good predictor of affective


organizational commitment. The path weight from job satisfaction to organizational
commitment is .70 for the exploratory model. Adding a path from global job satisfaction
to career intentions did not add any predictive value to the structural equation model.
Here, organizational commitment mediates the relationship between job satisfaction and
career intentions/continuation behaviors. Organizational commitment predicted both
career intentions and continuation behaviors adequately in the model. Since the model
did not explain all of the variation present (as evidenced by the significant chi-square
statistic), this difference could be the result of an unknown third variable that is
influencing this relationship. This problem should be explored more in the future.

The more the Navy understands regarding Sailor behavior, the more change can
be implemented to improve the Navy. The results of this study suggest that job
satisfaction is a primary predictor of organizational commitment and that both play an
important role in predicting both career intentions and actual continuation behavior. In
addition, the results of this paper suggest that career intentions are actually stronger in
predicting continuation behavior than organizational commitment when evaluating them
in the context of all of the other variables in the model. More research is needed to fully
understand these relationships, and the specific contributions to job satisfaction that can
be implemented in the Navy. A validation of this model should be conducted in the future
to verify these relationships. However, it is clear at this point that understanding Sailor
continuation behavior would be incomplete without measurement of job satisfaction,
organizational commitment, and career intentions.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
335

REFERENCES

Allen, N. J., & Meyer, J. P. (1990). The measurement and antecedents of affective,
continuance, and normative commitment to the organization. Journal of
Occupational Psychology, 63, 1-18.
Arbuckle, J. L., & Wothke, W. (1999). Amos 4.0 user’s guide. Chicago, IL: SmallWaters
Corporation.
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the
analysis of covariance structures. Psychological Bulletin, 88, 588-606.
Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for
covariance structures. Multivariate Behavioral Research, 24, 445-455.
Byrne, B. M. (2001). Structural equations modeling with AMOS: Basic concepts,
applications and programming. Mahwah, New Jersey: Laurence Earlbaum
Associates, Publishes.
George, J. M., & Jones, G. R. (2002). Organizational behavior (3rd ed.). New Jersey:
Prentice Hall.
Jaros, S. (1997). An assessment of Meyer and Allen’s (1991) three-component model of
organizational commitment and turnover intentions. Journal of Vocational
Behavior, 51, 319-337.
Kerce, E. W. (1995). Quality of life in the U.S. Marine Corps. (NPRDC TR-95-4). San
Diego: Navy Personnel Research and Development Center.
Locke, E. A. (1976). The nature and causes of job satisfaction. In M. D. Dunnete (Eds.),
Handbook of Industrial and Organizational Psychology (pp. 1297-1349). New
York: John Wiley & Sons.
Olmsted, M. G., & Farmer, W. L. (2002, April). A non-multiplicative model of Sailor job
satisfaction. Paper presented at the annual meeting of the Society for Industrial &
Organizational Psychology, Toronto, Canada.
SPSS, Inc. (1999). SPSS 10.0 syntax reference guide. Chicago, IL: SPSS, Inc.
Staples, D. S., & Higgins, C. A. (1998). A study of impact of factor importance
weightings on job satisfaction measures. Journal of Business and Psychology,
13(2), 211-232.
Visser, D. (2001, January 1-2). Navy battling to retain sailors in face of private sector’s
allure. Stars and Stripes. Retrieved March 3, 2003. http://www.pstripes.com/
jan01/ed010101a.html
Withers, P. (2001, July). Retention strategies that respond to worker values. Workforce.
Retrieved September 24, 2003. http://www.findarticles.com/cf_0/m0FXS/7_80/
76938893/print.jhtml

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
336

PSYCHOMETRIC PROPERTIES
OF THE DUTCH
SOCIAL SKILLS INVENTORY

Drs. Frans V.F. Nederhof


Senior Researcher
Defence Selection Agency
Kattenburgerstraat 7
PO Box 2630
1000 CP Amsterdam
The Netherlands
vf.nederhof@mindef.nl

INTRODUCTION
Social Skills are thought to be a valuable asset to all personnel. Still, instruments that enable
personnel selectors to assess social skills in an efficient, reliable, and valid manner are not readily
available in the Netherlands. Roughly speaking, Dutch personnel selectors are faced with the
options to assess social skills in an interview on the basis of an impression, and to assess these
skills in an assessment centre. Although efficient and perhaps better than not assessing social skills
at all, the reliability and validity of assessing social skills in an interview may be questioned. The
use of an assessment centre to assess social skills may result in more reliable and valid measures,
but does so at the expense of a lot of time and money. In short, neither option seems to be quite
satisfactory for a selection agency that is faced with the task to psychologically examine thousands
of recruits a year.

An instrument that could be of help is the Social Skills Inventory (Riggio, 1986). The Social Skills
Inventory (SSI) is a commercially of the shelf 90-item self-report measure of six basic social and
communication skills. Although the instrument has yet to be tested as an instrument to select
military personnel, the properties of the SSI are promising. Firstly, the skills measured with the SSI
seem highly relevant to military personnel. Secondly, the SSI has excellent psychometric
properties, which puts it on par with most personality questionnaires. Thirdly, the items that
compose the SSI are relevant to military personnel as well as to civilians, which makes the
instrument suited to select persons of both categories. Fourthly, completing the instrument only
takes about 40 minutes. Taken together, the SSI could be the reliable, valid and efficient instrument
that provides the personnel selector with an informative image of a person’s social make up.

The SSI measures six basic skills, focussing on the domains of non-verbal/emotional skills and
verbal/social skills. Within each domain three sub-domains are defined, which focus on
expressivity (skills in sending information), sensitivity (skills in receiving information) and control
(skills in regulating the interaction). Emotional expressivity refers to the ability of sending
emotionally relevant messages like feelings. Emotional expressivity is thought to be a relevant skill
in military teamwork, especially in the phases in which horizontal cohesion between team members
and vertical cohesion between team members and leaders have to develop. Examples-items are “I
have been told that I have expressive eyes” and “Quite often I tend to be the life of the party”.

Emotional sensitivity is the ability of receiving emotional messages such as overt and hidden body
language. Emotional sensitivity is thought to be a relevant skill in military teamwork in the group
dynamic phases in which people develop an individual identity in the team, and could help military
leaders in the development of trust between leaders and men. Example items are “It is nearly

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
337

impossible for people to hide their true feelings from me”; “At parties I can instantly tell when
someone is interested in me”.

Emotional control reflects the ability to regulate the sending of emotional and non-verbal signals.
Emotional control is thought to be relevant in leadership situations that require the unbiased
gathering of information, the neutral sending of sensitive information, as well as in stress situations
military leaders could be expected to remain a calm exterior. Example items are “I am very good at
maintaining a calm exterior, even when upset”; “When I am really not enjoying myself at some
social function, I can still make myself look as if I am having a good time”.

Social expressivity reflects the liveliness of social behaviour, and the initiative of a person in social
situations. In a close-knit society like the armed forces, social expressivity is thought to be of help
in creating a network of interpersonal relationships. Example items are: “At parties I enjoy
speaking to a great number of people”; “When in discussions I find myself doing a large share of
the talking”.

Social sensitivity reflects the ability in receiving socially relevant signals like norms, and the
awareness to these norms. The skill is thought to be of help to recruits to fit in a rule- and role-
based environment like the military, thereby reducing the risk of a mismatch between person and
organisation. Example items are: I often worry that people will misinterpret something that I have
said to them”; While growing up, my parents were always stressing the importance of good
manners”.

Social control reflects the ability to act according to certain roles, including self-presentational
skills. Social control is thought to be an asset to military leaders especially in situations of
uncertainty, where it comes to convincing people by setting an example. Example items are: “I find
it very easy to play different roles at different times”; “When in a group of friends, I am often the
spokesperson for the group”.

Each of the six SSI-scales is composed of 15 statements. Respondents are requested to indicate on a
five-point Likert scale the degree the statement applies to them, choosing between “Not at all like
me” - “A little like me” - “Like me” - “Very much like me” - “Exactly like me”. Each of the ninety
statements of the SSI is translated in Dutch, after which the instrument is pre-tested on a sample of
applicants for military leadership functions. The Dutch research findings are compared with
American findings using the original instrument, and conclusions for future research and selection
practices are drawn.

METHOD

Translation
A psychologist and a linguist independently translated the SSI, after which measures of similarity
were taken. Due to differences in structure of American English and Dutch, small but significant
differences were found in the translations. Therefore an iterative approach was chosen, discussing
each item in detail until a translation was found that accurately reflected the meaning of the original
items. Four items caused significant discussions among the translators. For the exact meaning of
these items the author of the SSI was contacted.
To assure the Dutch translation of the SSI is comprehensible to personnel of all levels of education
the translation was iteratively presented to a panel of civilian personnel with higher education, and
to several panels of military personnel in active service, ranks ranging from soldier to captain.
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
338

These panels were encouraged to mark any part of the translation they found odd, confusing, or
otherwise inadequate. Markings were then discussed, whereupon some minor changes were made
in the translation. After three iterations no more suggestions were made by the panels.

Participants
In this study data was gathered from147 persons applying for military service at non-commissioned
officer level (N=80) or officer level (N=67).14 Forty-one of the participants applied for NCO-
functions with the army, twenty-six applied for NCO-functions with the navy, and thirteen applied
for NCO-functions with the military police. Sixty-seven participants applied for officer functions
with the military police. Of the participants fifteen participants were female. The mean age of the
participants was 20 years, with a standard deviation of 4.4. Analysis of variance was used to test
group differences between the subgroups. Perhaps surprising, no significant differences between
the groups were found. Therefore, the sample was considered homogenous.

Measures
Participants were asked to complete the 90-item Dutch translation of the SSI, the 240-item NEO PI-
R five factor personality inventory, the 64-item WIMAS measure of influence behaviour, and the
frequently used 10-item 5-point Likert social desirability scale (Crowne & Marlowe, 1964;
Nederhof, 1981; Nederhof, 1985). Of the NEO PI-R the five main personality scales neuroticism,
extraversion, openness to experience, altruism, and conscientiousness are used. Of the WIMAS the
four types of influence behavior manipulation, directness, diplomacy and assertiveness are used

Instructions
Many psychological questionnaires have different sets of norm groups for research purposes and
for selection purposes. In order to probe the usefulness of the Dutch translation of the SSI as an
selection instrument for the Dutch military, participants were induced to believe that completing the
questionnaires was part of the standard psychological examination of applicants for the armed
forces. Under guidance of a test assistant the questionnaires were issued and completed in a
classroom setting by applicants for military service at the Defence Selection Agency. The setting
and intructions are thought to have had the expected effect: 50% of the 147 participants had a score
on the Crowne-Marlowe social desirability scale of 37 or higher15, which is considered to be quite
high.

RESULTS AND DISCUSSION

Reliability
An important question is whether the items that compose the respective scales measure the same
dimension. In a study by Riggio (1986) on a sample of undergraduate students relatively strong
Cronbach’s-α coefficients ranging from .75 for emotional expressivity to .88 for social expressivity
are reported, indicating good internal consistency (see table 1).

EE ES EC SE SS SC SSI-total
Riggio .75 .78 .76 .88 .84 .87
(1986) 15 items 15 items 15 items 15 items 15 items 15 items 90 items

14
The onset of this study coincided with a relatively sudden downsize of the Netherlands armed forces. As a
consequence the number of available participants was reduced.
15
Cronbach’s α=.89; Min = 10; Max = 50; M = 35,3; Stdv = 10.5.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
339

Nederhof .52 .72 .70 .77 .73 .76 .80


( 2003) 13 items16 15 items 15 items 15 items 12 items17 15 items 85 items
Table 1: Internal consistency of the SSI-scales in a sample of American undergraduate students,
and a sample of Dutch applicants for military service.

The internal consistency of most SSI-scales of the Dutch translation is found to be satisfactory too,
although somewhat lower than in the research by Riggio. On the scales emotional expressivity and
social sensitivity some items were deleted because they lowered the internal consistency of the
scale too much. With regard to the social sensitivity scale this resulted in an acceptable α-
coefficient. The internal consistency of the Emotional Expressivity scale is nevertheless regarded as
unsatisfying.

Scale correlations
Significant correlations between most SSI-scales are found in the American study as well as in the
Dutch study (see table 2). The signs of the respective correlations between the scales in the two
samples are equal. In the Dutch study we find a somewhat higher absolute value of the correlations
between emotional sensitivity, emotional control, social expressivity, social sensitivity, social
control and the SSI-total score than in the American study. Correlations between emotional
expressivity and the other scales are lower in the Dutch study. This finding may partly be explained
by the low internal consistency of the Dutch emotional expressivity scale.

EE ES EC SE SS SC SSI-total
EE 1 .44 -.38 .53 .00 .43 .59
ES .17 1 -.10 .41 .11 .31 .60
EC -.20 .23 1 .00 -.20 .08 .24
SE .32 .52 .22 1 -.17 .66 .78
SS .05 -.04 -.29 -.20 1 -.46 .06
SC .23 .42 .23 .69 -.45 1 .64
SSI-total .44 .74 .44 .83 -.01 .71 1
Table 2: Correlations between the SSI-scales and the SSI-total scale in research by Riggio (above
the diagonal, N = 629) and Nederhof (below the diagonal, N = 147).

On the one hand the correlations between the subscales of the SSI could be interpreted as a clear
sign the scales of the SSI are not independent, en should therefore be improved. In the case the SSI
would have been a personality inventory this interpretation is very tempting.
On the other hand the correlations may be the result of a natural process in which the development
of one social skill influences the development of one or more related social skills. Some evidence
for this interpretation is found in the negative correlation between some of the SSI scales. For
instance, a negative correlation is found between emotional expressivity and emotional control,
indicating that some of the more expressive persons would find it difficult to control the expression
of their emotions (and vice versa), a notion that appeals to common sense. Also, a negative
correlation is found between social sensitivity and emotional control, indicating that persons that
are better in controlling the expression of their emotions are found to be less able in receiving

16
Items 25 and 37 are deleted.
17
Items 5, 17 and 53 are deleted.
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
340

relevant information concerning social norms. This finding may also appeal to common sense,
when we realise social sensitivity is the ‘passive’ social skill of interpreting socially relevant
signals, that could be associated more with the listener group role. Emotional control is a more
active social skill of sending emotionally relevant messages, that could be associated more with the
active sender group role.
The negative correlation between on the one hand social sensitivity and on the other hand social
expressivity and social control again seems to stress the difference between the active social skills
and the passive skills. We will return to this thought later.
A strong positive correlation is found between the SSI-total score and five SSI-scales. This
correlation is interpreted as an indication that the SSI-total score may serve as a global indicator of
the development of social skills. This thought will be explored further later on in the paper as well.

Test-retest reliability
An assessment of test-retest reliability with a two-week interval was planned. Due to the
downsizing of the Dutch military that coincided with the study, this assessment was no longer
possible. Riggio (1986) found test-retest reliabilities ranging from .81 for emotional expressivity, to
.94 for the SSI-totalscore, and .96 for social expressivity.

Validity

Social desirability
In personnel selection situations a social desirability bias on test results is generally expected. In
accordance to earlier research findings by Riggio significant correlations are found between the
Crowne-Marlowe social desirability scale and the social scales of the Social Skills Inventory (see
table 3), indicating that the non-verbal/emotional scales of the SSI may be free of a social
desirability bias. A small positive correlation is found between social desirability and social
expressiveness, indicating that the more socially and verbally skilled persons are more inclined to
giving social desirable answers (R=.21). A positive correlation is found between social desirability
and social control, indicating that persons with greater skills in managing social situations are also
more inclined to giving social desirable responses to questions (R=.25). A remarkable but
significant negative correlation is found between social desirability and social sensitivity, indicating
the more socially sensitive persons are actually less inclined to giving social desirable responses
(R= -.30).

EE ES EC SE SS SC SSI-total
Riggio -.15 .12 .10 .26 -.31 .48 .04
Nederhof -.08 .09 .10 .21 -.30 .25 .10
Table 3: Correlation between the Crowne-Marlowe social desirability scale and the SSI-scales.

These findings indicate that the social scales of the SSI are prone to social desirable answering by
some persons. This conclusion is not unique, since this is the case for most instruments that are
used for selection purposes, as indicated by the significant difference between norm groups for
research purposes versus selection purposes. The problem of coping with social desirability in norm
groups may be however that in the same situation some people are more inclined to social desirable
answering than others (Nederhof, 1985), which greatens the chance of false negatives. As a
solution different norm groups might be devised for people that score low versus people that score
high on social desirability.

Personality

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
341

The relationship between behaviour and personality has been a topic of discussion for decades.
Recently, with the increasing popularity of the five-factor personality inventories some consensus
is growing to the notion that personality can influence behaviour in a direct way. As well as in
work by Riggio (1986), in this study the relationship between personality and social skills is
explored. The choice was made to measure personality with a Dutch translation of the NEO PI-R
five-factor personality inventory18, a relatively new but well validated personality questionnaire.
The flip side of this choice is that a comparison of research findings with work of Riggio (1986,
1989) is more difficult.

As discussed earlier, some of the social skills of the SSI refer to behaviour directed at changing or
influencing the situation, whereas other skills are directed at interpreting the situation without
changing it. In a way, social skills could therefore be considered ways of coping with social
situations, and of coping with oneself. A positive correlation is found between neuroticism and the
‘passive’ social sensitivity (see table 4). Negative correlations are found between the neuroticism
scale and the active emotional control, social expressivity and social control. These findings may be
seen as support that the SSI does measure relevant behaviour dimensions. The sign of the
correlations stresses the difference between the active and the passive social skills.

Neuroticism Extraversion Openness Altruism Conscientious-


ness
EE .02 .38 .31 -.01 .06
ES -.16 .33 .43 .10 .24
EC -.24 .07 .02 -.04 .28
SE -.35 .58 .32 .22 .42
SS .57 -.06 -.04 -.10 -.32
SC -.50 .44 .28 .15 .49
SSI-total -.24 .54 .43 .09 .39
Table 4: Correlations between the SSI-scales and the NEO PI-R five factor personality scales.

Extraversion is found to correlate positively with the expressivity scales of the SSI, thereby
supporting the expressivity construct. Also, a positive relationship between extraversion and social
control is found, which can well be understood since extraverts may be expected to be more
inclined towards influencing a situation than introverts.
Openness to experience refers to a mindset in which new and different ideas, values, feelings and
impressions are wellcomed. Most of the active scales of the SSI correlate positively with the
openness scale, which is expected. No correlation was found between openness on the one hand
and emotional control and social sensitivity on the other.
The construct of altruism refers to mindset in which the experiences, stakes and goals of other
people are found less or more important in comparison to the own experiences, stakes and goals.
Most correlations between the altruism and the SSI-scales are found to be insignificant (see table
4). Since altruism refers to a mind set and social skills to behaviour this finding is not surprising.
Conscientiousness refers to engaging certain tasks actively, in an orderly manner, and with stamina.
A positive correlation is found between the active social skills and conscientiousness. A negative
correlation is found with the passive skill social sensitivity.

Riggio (1986) used the 16 Personality Factor Test. However, the reliability and validity of the
18

Dutch translation of the 16 Personality Factor Test is questioned (Evers et al, 2000).
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
342

The correlations between the SSI and the personality dimensions generally support the notion that
the SSI measures behaviour dimensions in an expected manner. Thus the validity of the SSI is
supported by these findings.

Influence behavior
The participants in this study apply for leader functions at the level of non-commissioned officer or
officer. It is expected that the more socially skilled persons will self-select for these jobs. In order
to further judge convergent and discriminant validity, the relationship between SSI-scores and a
measures of influence behaviour is explored. The WIMAS measures four styles of influencing
others. The influence styles are exerting influence by manipulating others, influencing others by
diplomacy, influencing others by assertive behaviour, and influencing others by open and direct
requests.

Manipulation Diplomacy Assertiveness Directness


EE .13 -.03 .18 .19
ES .13 .30 .16 .29
EC .21 .25 .12 .18
SE .01 .25 .37 .40
SS .15 -.20 -.35 -.43
SC -.07 .31 .44 .39
SSI-total .16 .31 .31 .33
Table 4: Correlation between influence behaviour and the SSI-scales.

The SSI was originally composed of seven scales, the seventh being named “social manipulation”.
With exception of a correlation with emotional control, social manipulation did not correlate with
the other social skills (Riggio, 1986). The present findings confirm these results indicating that
manipulation is perhaps more a cognitive skill than a social skill.

Positive correlations are found between the active social skills as measured by the SSI, and the
influence styles assertiveness, diplomacy and directness. These findings further support the claim
that the SSI measures behaviour.

GENERAL DISCUSSION
In this study a translation in Dutch of the Social Skills Inventory was pre-tested on a sample of
applicants for military leadership functions. Although this study may have suffered from a in
imperfect sample, most results of this pre-test are seen as encouraging, and stimulate further
exploration of the instrument. Even so, there are two concerns regarding parts of the instrument.
Firstly, the internal consistency of the Dutch translation of the emotional expressivity scale is found
to be inadequate. This finding could partly be caused by the Dutch culture. For instance, items with
regard to touching other people may be prone to cultural different answers since the Dutch may not
be very inclined to touching others. Also, a self-selection effect may have caused some lowering of
the internal consistency of the emotional expressivity scale because items regarding the expression
of emotion may be less appealing to persons that want to join the armed forces.
Secondly, the social sensitivity scale causes some thought because of the strong positive correlation
between social sensitivity and neuroticism, and he low correlation or even negative correlations
between social sensitivity and the other scales. Inspection of the items that compose the scale learns
that some items may touch the topic of neurotic behaviour.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
343

However, possibilities are seen to adjust the emotional expressivity scale to the Dutch culture,
which may improve the internal consistency of the scale. Also, improvement of the social
sensitivity scale by replacing some items by others with a more neutral content is thought possible.
Since social skills are relevant skills for military leaders, and an instrument measuring these skills
comes in handy both for selection purposes as well as for leadership training purposes, investing in
the social skills inventory is worth thinking of, if the predictive validity of the four strong scales
proves to be encouraging. With the current sample, in the next months an indication of the
predictive validity of the four strong scales of the Dutch translation could be obtained in a military
training environment.

REFERENCES

Bartone, Paul T. & Faris R. Kirkland (1991) Optimal leadership in small army units. In: Gal,
Reuven & A. David Mangelsdorff (1991) Handbook of military psychology. Chichester: John
Wiley & Sons.

Crowne, D.P. & Marlowe, D. (1964) The approval motive. New York: Wiley.

Evers, A, J.C. van Vliet-Mulder & C.J. Groot (2000) Documentatie van Tests en Testresearch in
Nederland. Assen: Van Gorcum.

Hoekstra, H.A., J. Ormel and F. de Fruyt (1996) NEO PI-R & NEO FFI, Handleiding Big Five
Persoonlijkheidsvragenlijsten. Lisse: Swets & Zeitlinger B.V.

Luteijn, F., J. Starren and H. van Dijk (1985) Nederlandse Persoonlijkheidsvragenlijst. Herziene
uitgave 1985. Lisse: Swets & Zeitlinger B.V.

Nederhof, A.J. (1981) Beter Onderzoek. Leiden: SISWO

Nederhof. A.J. (1985) Methods of coping with social desirability bias: a review. European Journal
of Social Psychology (1985) vol. 15, pp. 263-280.

Riggio, Ronald E. (1986) Assessment of basic social skills. Journal of Personality and Social
Psychology, 1986, vol 51, no 3, pp 649-660.

Riggio, Ronald E. (1989) Manual of the Social Skills Inventory, Research Edition. Redwood City:
Mind Garden.

Riggio, Ronald E. & Shelby J. Taylor (2000) Personality and communication skills as predictors of
hospice nurse performance. Journal of Business and Psychology, vol 15, no 2, winter 2000, pp.
351-359

Riggio, Ronald E., Bronston T. Mayes & Deidra J. Schleicher (2003) Using Assessment Center
Methods for Measuring Undergraduate Business Student Outcomes. Journal of Management
Inquiry, vol. 12 No.1, march 2003.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
344

Deployability of teams
The Dutch Morale Questionnaire,
an instrument for measuring morale during military operations.

Bert Hendriks MA, Cyril van de Ven MA & Daan Stam MA*

Behavioral Science Division


Royal Netherlands Army

ABSTRACT

Unit morale is without doubt an important factor for deployability and combat power of
military units. Combat power is nowadays considered to depend, in addition to tactical,
logistic and technical capacity, on the morale of military personnel. Morale is a concept
containing a number of factors. This paper describes the development of the Dutch Morale
Questionnaire (DMQ). Based on 25 surveys applied to deployed units since 1997 a theoretical
model is constructed. The model itself and its theoretical background will be described.
Finally, practical experiences with the morale measurement in the Royal Netherlands Army
are discussed.

INTRODUCTION

Since the evolution of the art of war, various armed forces have paid a greater deal of
attention to psychological phenomena such as morale, elan, esprit de corps and other, similar
concepts which may influence combat power. Combat power is nowadays considered to
depend, in addition to tactical, logistic and technical capacity on the morale of military
personnel. That is why any attempt at predicting morale attracts the full attention of
operational commanders. Their attention focuses on all kinds of individual and group
processes which positively affect morale and the resulting resistance to stress. To a large
extent leaders of military units can influence these factors. The development and application
of a morale measuring instrument does, therefore, not just contribute to restricting or
preventing negative effects on personnel (such as disfunctioning, repatriation and exit), but
also acts as a force multiplier.
Research has shown that high morale reduces the risk of mental collapse of a group of
military personnel. High morale and a high degree of group cohesion have proved to limit the
development of combat stress during operations. They act as a kind of buffer for all kinds of
negative consequences of war (Tibboel, van Tintelen, Swanenberg en van de Ven, 2001).
Military personnel in highly cohesive sections feel more secure, have greater self-confidence,
less fear and higher motivation.
If commanders are responsible for improving morale it is important to have an instrument for
measuring morale. In 1997 the Behavioral Sciences Division of the RNLA designed such an
instrument, called ‘Deployability of teams’ (Tibboel et al., 2001). Since the mission of the
Dutch part of the Stabilization Force rotation 7 (November 1999) the Behavioral Sciences
Division has gained a lot of experience with this instrument and constantly evaluates and
improves the questionnaire.

*
A.B. Hendriks MA and C.P.H.W. van de Ven MA work as a researcher at the Behavioral Sciences
Division/Directorate of Personnel and Organisation/RNLA. D. Stam is associated with the Leiden University.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
345

Article structure
This paper describes the development and application of the instrument. First we will discuss
morale in relation to military operations. Then the factors that are used in the instrument are
described, analyzed and presented in a theoretical model. Finally we will discuss the
experiences with the instrument and how it helps commanders in improving morale before
and during operations.

Morale in relation to military operations


The following definition of morale is used:
Morale is a mental attitude held by an individual soldier in a task-oriented
group in relation to achieving the operational objectives for which the group
exists’ (van Gelooven, Tibboel, Slagmaat & Flach, 1997).

The following can be derived from this definition:


• morale is a mental attitude, and cannot therefore be directly observed;
• morale is a characteristic of an individual within a unit. The morale of a unit is comprised
of the morale of the members;
• morale is linked to (operational) objectives. Morale may be high (good) or low (bad).
High morale is by definition good. It means after all that the individual has a positive
mental attitude towards achieving the given objectives of the unit.

A unit with high morale consistently performs at a high level of efficiency and carries out its
allocated tasks accurately and effectively. In such units, each member makes a willing
contribution, assumes that his or her contribution is worth making and that the other members
of the unit will also make their contribution. If necessary, the members help each other,
without their help having to be requested. The few members who would prefer not to make
their contribution feel pressure to carry it out anyway. Members of such units rate themselves
highly, they often develop a strong sense of identification with each other and they are proud
of their unit. They are aware of the reputation of the unit and take pleasure in showing off
their membership (Shibutani; in Manning, 1991).

Morale analysis
Since commanders are expected to influence morale, it is important to be able to measure this
concept.
The armed forces of, in particular, Israel and the United States use questionnaires on morale.
The Israeli Defense Forces have psychologists, who advise commanders on matters
concerning morale, discipline, sleep management, the prevention of stress and the
deployability of troops in general. For advice with respect to morale and deployability of
troops, the IDF makes use of ‘morale and combat readiness questionnaires’ (Tibboel, van
Tintelen, Swanenberg en van de Ven, 2001).
In the Yom Kippur and Lebanese wars these morale questionnaires have actually been used
and they have formed a major source of information for the psychologists advising the
brigade and/or battalion commanders. For example, units were regularly replaced when it
could be demonstrated that they were too (mentally) exhausted by specific operations to still
be effectively deployable.
In the United States, morale research is used in a similar way. Here, a great deal of research is
done into combat stress, group cohesion, morale and the deployability of personnel from
combat units (Mangelsdorff et al., 1985).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
346

The French army, too, uses a morale questionnaire called the ‘Force Morale’ (Centre des
Relations Humaines, 1998). This serves as a reference and aid for commanders.
Almost all armed forces currently recognize the importance of morale and an effort is made
during training to create a strong group bond, high morale and high motivation by means of
gaining common experience.

Development of the Dutch Morale Questionnaire (DMQ)


In the development of the DMQ, international theoretical morale models were used. Van den
Bos, Tibboel and Willigenburg (1994) developed a theoretical model for the Dutch situation.
The basis of this model are factors distinguished by Gal (1986) and Mangelsdorff et al.
(1985). In 1997 the questionnaire was analyzed for the first time, questions were rephrased
and scales were added. In 2002, after nearly five years of experience with the questionnaire, a
second modification of the model has been executed. In table 1 an overview of measures is
presented that are used for the development of the DMQ.

Table 1. Overview of used measures


Unit Rotation Year Measure
1(NL) Mechbat 7 1999 1, 2, 3
SFOR in Bosnia 8 2000 1, 2
9 2000 1, 2, 3
10 2001 1, 2, 3
11 2001 1, 2
12 2002 1, 2
13 2002 1, 2
14 2003 1, 2, 3
1 (NL) Coy UNFICYP in Cyprus 2000 1, 2
41 Medical coy 2001 1
210 Fuel distribution coy 1997 1
400 Medical battalion 1998 1

The central question when developing a morale instrument was how that instrument might
contribute to raising morale. In order to be able to answer this question, it is necessary to look
at factors that influence morale. The concept of morale is composed of a number of factors,
the importance of each factor depends on the specific circumstances. 4 kinds of aspects can be
distinguished: individual, unit, leadership and organization.
In figure 1 a simplified model for predicting morale within military units is illustrated. Morale
can be predicted by measuring factors related to morale and is influenced by the net results of
a combination of the showed factors (interactive).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
347

Figure 1. Simplified version of the model

Leadership Personal
aspects aspects

Unit Individual
aspects aspects

Organizationa Environmental
l aspects aspects

A separate scale for each factor was developed in this instrument using a number of
statements. The instrument includes the following factors:

Individual aspects
1. Trust
2. Home front support
3. Job satisfaction
4. Organizational Citizenship Behavior

Unit aspects
5. Deployability
6. Unit cohesion
7. Identification with the unit
8. Respect

Leadership
9. Group
10. Platoon
11. Team

Organizational aspects
12. Appreciation of the military environment
13. Involvement in the objectives of the army
14. Familiarity with the assignment and the terrain
15. Perceived organizational support

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
348

The factors applied in the model are explained below. At the end of each description, several
examples are given from the morale questionnaire. Finally in this section the scale reliability
(Cronbach’s α) and the factor analysis are presented.
The items may be answered by the respondents using a five-point scale (from totally disagree
(1) to totally agree (5)).

Individual aspects

Trust
A great deal of attention in the morale questionnaire is devoted to trust. It involves trust in:
oneself, colleagues, arms and equipment, leaders and the presence of adjacent and support
units. Trusting and being trusted promotes dedication and discipline and helps military
personnel to function well under difficult conditions. Trust in one’s own talents within a
specific situation or operational conditions requires that one knows the objective and the role
one plays within that situation. One needs to have confidence in his capacity to be able to
carry out the tasks. Furthermore, it also concerns trust in the skills and willingness of the
group members to protect each other in operational conditions.

It is also important for a high morale that military personnel have confidence in the logistic
and combat-support units. In general, military personnel will be prepared to expose
themselves to danger when they are convinced that medical care will be effective if they get
wounded, and the same applies for counseling and treatment of combat stress. Furthermore,
they must be confident that defects are repaired as quickly as possible and food, fuel and
ammunition supply is guaranteed.

Examples of items:
- I pay an important contribution to the success of my group.
- I think my platoon will perform well in combat situations.
- The equipment used by my group is up to its task.
- I think we will receive sufficient support in our tasks from logistic units.

Home front support


Morale is strongly influenced by the extent to which military personnel are concerned about
the situation at home. Concern about the relationship between the home front and work is
closely linked to the degree of self-confidence, the reduction of uncertainty, level of fear and
motivation. This factor includes the assessment of survival chances by military personnel and
worry about those family members left behind and vice versa.
Limited opportunities for communication with the home front in particular may cause
isolation and frustration. Separation from family and not being able to help if things go wrong
at home is generally perceived as stressful. Military personnel must be able to start missions
with their minds at ease, free from cares and problems, which negatively influence morale.
Concern about the home front comprises the care, which the military personnel wish to give
to the home front and the fear that something might happen to the home front. On the other
hand, this factor implies the fact that the soldiers have to feel supported by their home front in
what they are doing. If they do not feel supported they feel they have to choose between the
army and the home front.

Examples of items:

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
349

- My home front is proud of me.


- My home front respects my decision to work for the Royal Netherlands Army.
- I think there are sufficient opportunities to let the home front know how I am.

Job satisfaction
Characteristics of the job go hand-in-hand with satisfaction and (inner) motivation and thus
also with morale. According to Hackman and Oldman (1976) five objective task
characteristics can be distinguished which influence employee motivation: skill variety, task
identity, task significance, autonomy, and feedback. The more a job meets these
characteristics, the more motivated employees will be.

In order to achieve high morale, it is important that individuals have a clear idea of their role
and view their role as useful and significant. On the one hand, this concerns their share in the
units’ objectives in a specific situation. Also, it is the role, which relevant other people (the
group and the leader) expect of an individual. Roles ensure predictable behavior and regulate
the co-ordination of tasks among the group members.
Examples:
- In my job I can show my capabilities.
- I have a useful function/task within my section.

Organizational Citizenship Behavior (OCB)


Members of the unit show organizational citizenship behavior, when they are willing to do
more than they are asked for in the description of their job without being ordered. They show
‘extra-role’- behavior without expecting a reward. This behavior is linked to unit cohesion
and raises the efficiency en effectiveness of the unit (Organ, 1988). Where unit members
show more OCB, unit cohesion will grow.

Examples:
- If necessary I will die for the interest of my unit.
- Even under the most badly circumstances I will try hard to fulfill my tasks.

Personal aspects
The definition showed that morale is an individual attitude. Every individual soldier
contributes in morale within the unit in his own way, for example because of his age, military
experience and experience in humanitarian missions (Labuc, 1991). To inform the
commander about the personal aspects of his unit members and the relationship with the
aspects of morale development, the questionnaire starts with a few personal questions.

Unit aspects

Cooperation within the unit, identification and respect are all related to unit cohesion.
Unit cohesion is a major factor of influence on morale. In a cohesive unit, members feel
secure and protected. The higher the unit cohesion, the more influence the unit will have on
its members: unit members accept objectives, decisions and norms more quickly. There is
stronger pressure to conform to the unit and there is less tolerance of unit members who do
not agree.
Where there is high cohesion, membership of a unit is maintained for longer, unit objectives
are achieved sooner, there is greater participation and loyalty from unit members, good
cooperation, better communication, less absenteeism and the members feel more secure (Van

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
350

Gelooven, Tibboel, Slagmaat & Flach, 1997). Members identify themselves with the unit and
respect each other.
Strong unit cohesion does not, however, necessarily have to be positive. It may cause
aggressive behavior. A sense of ‘us’ which is too strong may lead to deindividualization,
groupthink and aggressive behavior with respect to the outgroup. The positive effects of unit
cohesion may, however, be stimulated and the negative minimized (Avenarius, 1994).
Shared experiences, such as joint exercises and recreational activities, offer the opportunity to
gain cohesion. These shared experiences may confirm whether the section members are
willing to help each other and may distinguish the group from others. The greater the
awareness of group members that they depend on each other for success, the more this
contributes to group cohesion (Van Gelooven, Tibboel, Slagmaat & Flach, 1997).

Examples of items:
- I think my group is satisfied with my performance.
- In my group, people feel responsible for each other.
- In our group, we work together well.
- In our group we respect each other.

Leadership

Trust in the leader is based on the personal characteristics of the leader, such as
professionalism, the example set, integrity, patience and being able to assess the performance
capability of the subordinate units and military personnel realistically. If leaders want to have
the unconditional support of their unit, the unit not only needs to know that the leaders are
competent, but also that they care about them. It is the trust of the unit members in the skills
and willingness of their direct superiors to protect them under operational conditions. Often,
of great importance is the emotional bond, which stimulates unit members to follow their
commander, even in life-threatening conditions.
In the Netherlands Armed Forces a high degree of independence and initiative is expected of
military personnel. This enables mission command and control down to the lowest level (Van
Gelooven, Tibboel, Slagmaat & Flach, 1997).
Military personnel identify with the unit leaders and adopt the intentions and objectives of
those leaders (to a large extent). As the leaders are also members of ‘higher groups’, they link
the group to the objectives of the ‘higher’ unit. Insofar as the leaders of the small groups are
seen as representatives of the larger organization, the trust they gain may be passed on to the
organization. Leaders therefore play a key role in transferring the organizations’ objectives to
the group. That is why leadership is measured on three different levels: the group-, platoon-
and team commander.

Examples of items:
- My group commander always clearly tells us what needs to be done.
- My platoon commander has sufficient skills to do what has to be done.
- In general I am satisfied with the way in which my commander leads our team.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
351

Organizational aspects

Appreciation of the military environment


The military environment can not always be as comfortable as a ‘regular’ working
environment. During military operations vital needs as food and drink, relaxation and rest,
clean clothes, sufficient accommodation and postal deliveries become very important.
A shortage of these necessities of life has a negative effect on morale. It is largely subjective
and relative as to whether these necessities are provided properly. Among other things, a
comparison is made with what personnel had expected or were used to before the mission. If
the organization doesn’t take care well enough of those aspects, morale is negatively
influenced (Van Gelooven, Tibboel, Slagmaat & Flach, 1997).
The appreciation of the military environment is based on an individual perception of the
situation and is related to expectations. The worse the situation in the barracks, the better the
appreciation of the operational working environment.

Examples of items:
- I am satisfied with the relaxation facilities on the base.
- I am satisfied with the quality of the meals on the base.
- The organization tries hard to deliver the mail on the right place in the right time.

Involvement in the objectives of the army


Involvement with the objectives concerns both the objectives of the unit and the objectives of
the army and country. Esprit de corps concerns the relationship with and trust in a higher unit
(the brigade, the corps), or even more abstractly the organization (the RNLA or - in
peacekeeping operations - the EU or UN). This involvement with the reputation of an
organization exceeds the borders of the primary unit.
The objective and the legitimacy of a deployment in operational conditions must be clear and
convincing to the individual. This objective does not necessarily have to be of international
importance (such as protection of democracy). But nothing is worse for morale than the sense
that activities are pointless and serve no purpose whatsoever. The lower in the hierarchy, the
more specific the objectives will be. Objectives must be challenging and realistic in order to
fully motivate those carrying them out.

Examples of items:
- I support the objectives of the Dutch army.
- I contribute positively to international security by serving in the Dutch army.

Familiarity with assignment and terrain


Familiarity with the assignment and the terrain reduces uncertainty, raises self-confidence and
consequently contributes to higher morale. It is important that leaders devote attention to
environmental factors, which influence morale, and thus the deployability of personnel. Data
about the terrain also need to be gathered for peacekeeping operations.
According to Labuc (1991) morale is determined by the background of the military personnel
and their unit and the environmental factors which apply at that time.
This involves a considerable number of factors, such as: operational conditions, high or low
level of the spectrum of force, whether the operation is offensive or defensive, the seriousness
of the situation, the length of time involved, logistic support, losses, the terrain and the
climate.
Examples of items:
- I was sufficiently informed of the assignments my platoon could expect during the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
352

operation.
- In my platoon, we are well informed on our future tasks.

Perceived organizational support (POS)


Military personnel have a need for support from the organization. Especially during military
operations they need the perception that the army is interested in the individual soldier and
cares for him/her. POS contributes to organizational citizen behavior (OCB): the more
organizational support is perceived, the more OCB might be expected.

Examples of items:
- The Dutch army is interested in my well being.
- The Dutch army will help me when I am in trouble.
- The Dutch army appreciates my work.

In table 2 the scale reliabilities are presented. Most of the scales reached a high level of
reliability.

Table 1. Scale reliability (Cronbachs α)


Dimension Variable Number of Cronbachs α
questions
Individual Trust 5 .8091
Home front support 4 .7660
Job satisfaction 6 .8322
Organizational citizen behavior 6 .7943
Group Unit cohesion 4 .8429
Identification 6 .8468
Respect 4 .9294
Leadership Groupcommander 8 .9279
Platooncommander 8 .9485
Organization Appreciation of military environment 3 .6915
Involvement in the objectives of the army 4 .8126
Familiarity with the assignment and 6 .7928
terrain
Perceived organizational support 6 .9009

A factor analysis (Varimax rotation with Kaiser Normalization) confirmed the assumed data
structure. Through confirmative factor analysis the covariance structure was tested. ‘Job
satisfaction’ appeared not to fit in the dimension of ‘individual aspects’. After putting these
items in the ‘organizational aspects’ dimension, the analysis confirmed the data structure
completely.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
353

PRACTICAL APPLICATION OF THE DUTCH MORAL QUESTIONNAIRE (DMQ)

The Behavioral Sciences Division of the Royal Netherlands Army has used this morale
monitor for determining the morale of units mostly during military international operations
since 1990. Experience of the DMQ therefore chiefly involves peacekeeping operations. The
recently modified instrument can make valid statements about factors of influence on the
morale of units.
With the help of the DMQ and the feedback, and the advice of the investigators the leaders of
the units can improve the morale of the team, platoon or group. Especially the measures taken
before and during the deployment can be of great help.

Method
As high morale cannot be achieved in a short period of time, it is important that the unit to be
deployed already possesses a reasonably high level of morale. That is why it is important to
assess the unit morale just before operational deployment. Bottlenecks can be mapped in time
and the commander is able to emphasize relevant aspects and to tackle and solve any
problems. The instrument therefore consists of three measurements:
a. the first measurement is just before operational deployment (approx. 1 month);
b. the second is during deployment (approx. 3 months after the start of the operation);
c. the third is after return the deployment (approx. 1 months after return).

Before the survey begins, the Behavioral Sciences Division makes contact with the
operational commander of the battalion to be deployed. The battalion commander is informed
about the objectives of the project, the characteristics of the process and the current
questionnaire and the aspects of anonymity. If this commander agrees to the survey, the same
information is given to the team commanders. The team usually consists of an extended
company, complemented with logistic and engineer units. During introduction emphasis is
placed on anonymity and the fact that the information is to be used as an analysis of the
strengths and weaknesses of the team and therefore as advice for improvement. A contract is
drawn up with the team commander which clearly states how the survey and the
corresponding report will be handled. Because of the defined objective and guaranteed
anonymity, no information on the results is passed on to the battalion commander by the
researchers without the teamcommanders’ permission (only the approval of the battalion
commander is needed for the survey). Commitment from the personnel in the unit is essential
for the data collection and for broad acceptance and recognition of the results.
Each measurement consists of a description of the morale of the team. Each factor which
influences morale is given a score using a number of questions and a scale of 1 (poor) to 5
(good). A distinction is made here between soldiers and officers/NCOs.
The results of each measurement, combined at team level, are fed back to the direct
commander of the team. This is done in an assessment in which the survey results are linked
to the specific background, recent (combat) experience, characteristics, culture etc. of the
team, which are not entirely known to the researcher. Thus the results can be further refined
and discussed with the team commander and noteworthy scores can be highlighted. Next, on
the advice of the researcher the team commander determines what, if any, specific action
should be undertaken to improve one or more aspects of morale in (part of) the team. Specific
measures may be the structural introduction of more breaks, the issue of more information
about progress and the objective of the combat (or the exercise) or more attention to
communication among individuals or between individuals and those in positions of authority.
Preconditions
The application of the instrument is linked to the following preconditions.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
354

1. A reliable response to the questions in the morale questionnaire is promoted by the


anonymity and confidentiality guaranteed by the researcher. If that anonymity is not
guaranteed this may lead to biased responses.
2. Commitment from the respondents.
3. The results are geared to (adjusting) steering of the unit along the line of command, i.e.
the team commander.
4. The point of the first measurement is determined by the point in time in which the unit is
complete and trained for deployment and aware of the assignment.
5. The instrument aims to make a ‘photo’ of a unit with respect to morale aspects; it is not
meant to be an instrument of judgement, but one of improvement; it concerns a mutual
agreement between the researchers and the team leadership to achieve that improvement.

RESULTS/ EXPERIENCE

Our experiences derive from several surveys that have been conducted. The DMQ has been
used to measure morale within combat, support and logistic units. The measurement takes
approximately seven working days from issuing the questionnaires to the respondents up to
and including discussion of the results with the team commander. So far the survey has been
conducted in various units in different environments. For example, in 1990 a survey was
started about the situation at the barracks, and subsequently among the Special Forces, IFOR
and SFOR (6,7,8). Only the latter surveys encompass measurements prior to, during and after
the operation in the barracks situation and under operational conditions. In 2003 a tailor made
moral questionnaire is designed on behalf of the Airmobile Brigade because of their high
readiness state. In this way the brigadier is informed about the morale state of his companies
and is able to send units on missions only when their moral state is sufficient. In 2004 the
moral questionnaire will be used with the Dutch stabilization forces in Iraq (SFIR).

Advantages and disadvantages of the DMQ


The following advantages of the instrument can be seen from the surveys.
a) The instrument provides insight into the quality of personnel within the team at
operational level in specific and measurable terms with respect to morale and indicates
(predicts) an increased or decreased risk of dropouts using the morale indicator.
b) The team commander receives a clear overview of the morale aspects. The survey offers
insight into the relevant personnel variables in relation to deployment and combat
readiness. The information is used for a strengths and weaknesses analysis of the team and
as advice for improving aspects. The use of a morale instrument at unit level enables team
commanders to gain insight into the state of affairs concerning the influencing factors of
morale. On the basis of this survey and the corresponding advice, specific measures to
improve morale can be implemented.
c) The analysis is not complicated and provides a structural and systematic overview of
morale aspects and contains a great deal of information on the situation before, during and
after the mission
d) Conducting the survey and the (brief) report take little time.
e) The response is very high (80-90%).
The following disadvantages or points of attention were noted.
a) The results of the measurements can be viewed as threatening or painful by the
(leadership of the) unit. This is certainly true of information on leadership, cohesion and
trust. Even though the report is not meant to be a judging instrument, team commanders
might use it that way. The researchers therefore must emphasize this misunderstanding

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
355

before reporting the results.


b) Although the information on the morale of a unit may be interesting for the next level up
(battalion), passing on information about morale is a sensitive issue. Care should be taken
to avoid passing on information gained by the morale instrument being seen as an
inspection by the next level up. In this way the information value of the instrument is lost,
which is not desirable. This is why the team commander is asked for approval sending the
team results to the battalion commander.
c) Changing requires response time. High morale is not something that can be achieved in a
short time, whereas a specific (negative) event can cause a fairly quick decrease in morale.
For instance, when there are low scores on group cohesion and trust in leaders in a
platoon, it will not be so easy to improve quickly.

The results described below were partly derived from interviews with team commanders and
are partly based on their experience. Anonymity means that specific results cannot be dealt
with, but in general the following conclusions can be drawn so far:

Application of the DMQ


During SFOR, the DMQ was used in units in Bosnia as a direct advisory instrument for team
commanders in the field. The results are the most reliable and widely available indicators so
far for the morale of a team and team commanders are satisfied with the applied analysis and
advice. They usually recognize and confirm the results during the presentation and, if this is
not the case, they return to it at a later measurement and generally confirm the earlier
findings.
The respondents are open, positive and willing to complete the questionnaires. Almost no one
misses the sessions and the discussions are serious and open. The team commanders process
the (positive and negative) results and plan the solutions together with the behavioral
scientists.
The DMQ is currently used where a commander so requires. Experience has shown that
almost all commanders decide, following consultation with the Behavioral Sciences Division,
to use the instrument in the context of the mission abroad. After the reports are finished,
commanders are asked to assess the use of the instrument by means of a report mark. These
report marks show how much the DMQ is appreciated.

Training prior to the mission


Within the armed forces morale is often used as an indicator for deployment readiness and
combat readiness. In view of the fact that aspects of morale only demonstrate their
effectiveness under operational conditions, the important thing is to act preventively and to
gain insight into aspects of morale before deployment. However, during combat readiness
preparation for the mission – frequently as a result of a lack of time – often insufficient
attention is paid to teambuilding and the conditions for creating cohesion and trust can not
always be achieved. In particular, since logistic units are often formed just before deployment,
their morale scores are in general lower than those of combat units.
The same applies to military personnel sent on missions abroad individually.

HOW TO PROCEED

Behavioral scientists are increasingly being deployed in the operational environment. This has
proved its worth in the application of the morale instrument. The Dutch Morale Questionnaire
is currently being expanded by variables concerning dealing with stress and the influence of

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
356

the personal characteristics, for instance, coping is also being looked at. Whether and when
the DMQ is expanded by items concerning coping styles depends on the number of items that
need to be added. There is a risk of the list becoming too long and less acceptable if too many
‘soft’ psychological elements are included which seem to have no direct relationship with the
operational task.

The DMQ has been frequently under construction. The Behavioral Science Division keeps on
improving the questionnaire by collecting as much data as possible, analyzing the data set and
reformulate items. Eventually the aim is to apply this morale instrument for each unit to be
sent abroad, prior to, during and after the mission.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
357

LITERATURE

Avenarius, M.J.H. (1994). De positieve en negatieve kanten van groepscohesie. Den Haag:
DPKL/Afdeling Gedragswetenschappen.

Bos, C.J. van de, Tibboel L.J., Willigenburg T.G.E. (1994). Een onderzoek naar de
bruikbaarheid van de Nederlandse moreelvragenlijst. Den Haag: Rapport DPKL/GW 94-
05.

Centre des Relations Humaines, 1998

Gal, R. (1986). Unit morale: From a theoretical puzzle to an empirical illustration - An Israeli
Example. Journal of Applied Social Psychology, 549-564.

Hackman, J.R. & Oldham, G.R. (1976). Motivation through the design of work: test of a
theory. Organisational Behavior and Human Performance, 16, 250-279.

Gelooven, R. van , Tibboel, L.J., Slagmaat, G.P. & Flach, A. (1997). Studie Masterplan KL-
2000. Moreel: Vakmanschap – Kameraadschap - Incasseringsvermogen. Den Haag:
CDPO/Afdeling Gedragswetenschappen.

Labuc. S. (1991). Cultural and societal factors in military organisations. In R.Gal & A.D.
Mangelsdorf (Eds.). Handbook of Military Psychology (p.471-489). New York: Wiley.

Mangelsdorff, A.D.; King, J.M.; O’Brien, D.E. (1985). Battle stress survey. Fort Sam
Houston. Consultation report.

Manning, F.J. (1991). Morale, cohesion, and esprit de corps. In R. Gal & A.D. Mangelsdorff
(Eds.), Handbook of Military Psychology (p. 453-470). New York: Wiley.

Organ, D.W. (1988). Organizational Citizenship Behavior: The good soldier syndrome.
Lexington, MA: Lexington.

Tibboel, L.J., Tintelen, G.J.A. van, Swanenberg, A.B.G.J. & Ven, C.P.H.W. van de (2001).
The Human in Command: Peace Support Operations (p. 363-380). Breda: Mets & Schilt.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
358

Measures of Wellbeing in the Australian Defence Force

Colonel A.J. Cotton,


Director of Mental Health
Australian Defence Force, Canberra, Australia
Anthony.Cotton@defence.gov.au

Ms Emma Gorney
Directorate of Strategic Personnel Planning and Research
Department of Defence, Canberra, Australia
Emma.Gorney@defence.gov.au

Abstract
Cotton (2002) reported on the establishment of a Wellbeing program in the Australian
Defence Force, concluding that a key issue was to identify appropriate measures of
wellbeing that would allow comparison with measures of objective personnel capability in
order that wellbeing could be incorporated into ADF capability planning. This paper
reports the results of the ADF’s first attempts to identify such measures and how they might
relate to personnel capability and other key strategic personnel indicators.

INTRODUCTION
The Australian Defence Force Mental Health Strategy (ADF MHS) was established
in 2002 to provide an overarching strategy for the provision of mental health services to the
ADF (Cotton, 2002). The strategy is built around eight initiatives:
• Improving mental health literacy in the ADF.
• Integrating the provision of mental health services by ADF providers.
• Improving treatment options available to ADF members.
• The development of a comprehensive training and accreditation framework for ADF
providers.
• The implementation of a comprehensive mental health research and surveillance
program in the ADF.
• The implementation of the ADF Drug and Alcohol Program (DAP)
• The implementation of the ADF Suicide Prevention Program (SPP)
• The enhancement of wellbeing and resiliance in ADF members.

The last of these initiatives has been operationalised through the establishment of
the Australian Defence Organisation (ADO19) Wellbeing Forum (Cotton, 2002). This is a
voluntary organisation of those agencies within the ADO that feel that they make some
contribution to the sell being of ADF member or Defence civilians. One of the early
decisions made by the Wellbeing forum was that ability to measure wellbing as objectively
as possible was identified as a key requirement for the success of the program in the ADF.

19
ADO refers to the collective group of ADF (i.e., uniformed) and Defence civilian employees of the
ADF. Note that focus of this paper in wellbeing in ADF members.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
359

In particular it was identified that being able to identify the ability of wellbeing to
contribute to the “bottom line” of the ADF was needed to provide sufficient evidence to win
support for programs that might draw scarce resources away from other personnel
initiatives. The measurement of well being has been shown to be possible in the civilian
sector. Research from organisations like the Corporate Leadership Council20 showed
returns on investments in different civilian companies ranging from $220-$300 per
employee, and return on investment (ROI) rates of between three and seven dollars per
dollar spent on well being programs.
Measuring ROI for wellbeing programs in the military is more difficult, in particular
measuring the potential costs to the organisation are difficult, also many of the givens in
military tradition (e.g., medical care) are seen as the primary elements of wellbeing
programs in the civilian sector. A number of strategies were identified that could be
applied to provide some indication of the contribution of wellbeing to the effectiveness of
the ADF. These included identifying how the ADF compared with the general Australian
population, this would provide an external benchmark of wellbeing in the ADF. A second
strategy identified was to identify personnel markers within the ADF population to identify
an internal benchmark of wellbeing, and then to identify possible outcome variables that
might contribute to an understanding of how well being contributes to the effectiveness of
the ADF.
One of the difficulties with measuring wellbeing is that there are two types of
measures of wellbeing: objective and subjective. The types of indicators that might be
described as objective measures of wellbeing include absenteeism for health or other
reasons, staff turnover rates, rates of industrial accidents, and so on; in other words the
behavioural manifestations of wellbeing. Subjective indicators of wellbeing incorporate
measures of job satisfaction and commitment, morale and cohesion (to use more military
terms).
The use of objective indicators can present a range of problems, most particularly
the ability to link them directly to wellbeing, which is essentially a subjective term. The use
of subjective indicators of organisational health has been common practise for decades, and
is usually effected through the conduct of staff surveys that are commonly used in both the
civilian and particularly the military sectors. In the ADF the primary staff survey tool used
is the Defence Attitude Survey and this instrument provides a possible means for measuring
wellbeing in the ADF.

AIM
The aim of this paper is to report initial attempts to measure wellbeing in the ADF
through the collection and analysis of subjective measures collected through the Defence
Attitude Survey.

THE DEFENCE ATTITUDE SURVEY

The Directorate of Strategic Personnel Planning and Research has responsibility for
the administration of the Defence Attitude Survey (DAS). The DAS was first administered
in 1999. It replaced the existing single Service attitude surveys, drawing on content from

20
Corporate Leadership Council Literature Search, ROI of Wellness Programs, March 2002,
www.corporateleadershipcouncil.com

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
360

each of the RAN Employee Attitude Survey (RANEAS), the Army’s Soldier Attitude and
Opinion Survey (SAOS) and the Officer Attitude and Opinion Survey (OAOS), and the
RAAF General Attitude Survey (RGAS). The amalgamation of these surveys has facilitated
comparison and benchmarking of attitudes across the three Services whilst maintaining a
measure of single Service attitudes.

The survey was re-administered to 30% of Defence personnel in April 2001. The
results were widely used throughout the organisation. To maintain more frequent provision
of key information, the Your Say Survey was developed, it takes a number of key items
from the DAS to be more regularly administered to gather trend data on the organisation.
The Your Say Survey is administered to a 10% sample of Defence members twice a year,
and while it provides useful and current information, the sample size is not extensive
enough to allow detailed breakdowns of the data.

It was determined by the Defence Committee21 in May 2002 that the DAS should be
administered annually to a 30% sample of Defence personnel, allowing for more
comprehensive data analysis. The Committee also directed that an Attitude Survey Review
Panel (ASRP) be established, with representatives from all Defence Groups, to review and
refine the content of the DAS. The final survey was a result of thorough consultation
through the ASRP. The item selection both maintained questions from previous surveys to
gather trend data, and incorporated new questions to feed into Balanced Scorecard and
other Group requirements. The purpose of the Defence Attitude Survey is threefold:

• To inform personnel policy and planning, both centrally and for the single
Services/APS;
• to provide Defence Groups with a picture of organisational climate, and;
• to provide ongoing measurement in relation to the Defence Matters scorecard.

METHODOLOGY OF THE DAS

Questionnaire

The DAS consists of four parallel questionnaires, one for each Service and one for
Civilians. The Civilian form excludes ADF specific items and includes a number of items
relevant to APS personnel only. Terminology in each form was Service-specific. Each
survey contained a range of personal details/demographic items including gender, age, rank,
information on deployments, specialisation, branch, Group, years of Service, education
level, postings/promotion, and family status (44 for Navy, 40 for Army and Air Force, 35
for Civilians). Navy personnel received additional questions regarding sea service. The
survey forms contained 133 attitudinal items (some broken into parts) for Service personnel
and 122 for Civilians. As in previous iterations, respondents were given the opportunity to
provide written comments at the end of the survey.

As directed by the Defence Committee, a number of changes were carried out on the
survey items, through discussion of the ASRP. This refinement process attempted to
balance the maintenance of sufficient items for gathering trend data and reducing the

21
The Defence Committee is the ADF’s senior management committee. It is responsible for making
all high level decisions effecting the ADF.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
361

number of items to decrease the length of the survey. While a number of items were
excluded due to no longer being relevant or appearing to duplicate other questions, further
items were added to address issues previously excluded. The new additions included items
on Wellbeing, Internal Communication, Security, Occupational Health and Safety and
Equity and Diversity (which had been included in the 1999 iteration of the survey). Further
demographic items were also added regarding work hours (predicability of and requirement
to be on-call) as well as awareness of Organisational Renewal and the Defence Strategy
Map. The additions resulted in more items being included in the 2002 survey than the 2001
version, however total numbers were still lower than the original 1999 questionnaire.

Attitudinal items were provided response options on a five-point scale where one
equalled ‘Strongly Disagree’ and five equalled ‘Strongly Agree’ (a number of the items
were rated on satisfaction or importance scales, rather than the more common agreement
scale).

Sample

The sample for the Defence Attitude Survey is typically stratified by rank, however,
concerns had been raised by Group Heads that Groups were not being representatively.
Thus, for the 2002 sample, the thirty percent representation of the Organisation was
stratified by both rank and Group. Recruits and Officer-Cadets were not included in the
sample, as per the 2001 administration. Upon request, the whole of the Inspector General’s
Department was surveyed to provide sufficient numbers for reporting on this small Group.

Administration

The survey was administered as a ‘paper and pencil’ scannable form and employed
a ‘mail-out, mail-back’ methodology. For a selection of personnel in the Canberra region,
where correct e-mail addresses could be identified, the survey was sent out electronically.
This methodology allowed the survey to be completed and submitted on-line or printed out
and mailed back in the ‘paper and pencil’ format.

Due to declining response rates for surveys and inaccuracies encountered in address
information, additional attempts were made to ensure that survey respondents received their
surveys and were encouraged to complete them. Surveys were grouped into batches to be
delivered to individual units. In coordination with representatives from each of the Service
personnel areas, units were identified and surveys were sent to unit CO/OC s for
distribution to sampled personnel, accompanied by a covering letter from Service Chiefs. A
number of issues were encountered in this process, including the fact that some CO s are
responsible for vast numbers of personnel (for example, HMAS Cerberus), and this process
entailed double-handling of the survey forms.

Surveys were sent out directly from DSPPR in Canberra, with Civilian forms
delivered via regional shopfronts as specified by pay locations. Completed questionnaires
were returned via pre-addressed return envelopes directly to DSPPR.

Table 1 below outlines the response rate22 by Service/APS. The response rate from
2001 is also included, and the decline indicates that delivery via unit CO/OC s was not an

22
The response rate is calculated by the number of useable returns divided by the number of surveys mailed out minus the number of
surveys returned to sender.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
362

improved methodology, and also highlights the operational commitments of personnel,


particularly those in Navy.

Table 1
Air
Navy Army APS Total
Force
Sent 4640 6841 3461 5625 20567
Return to Sender 489 265 179 312 1245
Useable Returns 1532 2669 1808 3504 9513
Response Rate 36.9% 40.6% 55.1% 66.0% 49.2%
2001 Response 52.0% 50.7% 60.9% 56.2% 54.5%
Rate
2001-2002 -15.1% -10.1% -5.8% +9.8% -5.3%
Difference

The DAS is a comprehensive organisational measurement tool that has provided


very good advice to senior ADF management. Given the stability of the item set in the
DAS, the inclusion of the a set of wellbeing items in the DAS should allow some for an
appropriate tool for the measurement of wellbeing in the ADF and its possible links to
broader organisational outcomes.

Wellbeing Items
The Australian Unity Well Being Index is a national measure of Australian's views
on a range of economic and social indicators in Australia at both a personal and national
level23. The Index asks respondents their satisfaction with their:

• health,
• standard of living,
• achievements in life,
• personal relationships,
• sense of personal safety,
• community connectedness,
• future security, and
• overall satisfaction with life.

These items were modified (after consultation with the ASRP) by the inclusion of an
item on connectedness to the military community, and removal of the items on future
security, and personal safety, and then incorporated into the ADO General Attitude Survey
that was administered early in 2003. The inclusion of these items should allow the
measurement of wellbeing in the ADF compared to the general population. It will also
provide a standard set of wellbeing items that can be linked to other organisational
measures.

Demographics

23
Taken from the Introduction section of the Australian Unity Well Being Index web site,
www.australianunity.com.au

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
363

A randomly selected sample of 1,500 ADF cases (i.e., uniformed members) were
taken from this data set to provide the analysis for this paper. Demographic data for the
sample are:

• Gender – 87% males, 13% females.


• Age – mean 32.52 years, median 32 years.
• Service – RAN 26.5%, Army 43.9%, RAAF 29.6
• Length of Service – mean 11.73 years, median 11 years.
• Proportion having served on operations – 47.7
• Proportion in a recognised relationship – 61.6

RESULTS

Psychometric Properties of Wellbeing Items

Analysis of distributions of the wellbeing items showed that all were non-normal,
primarily due to being negatively skewed, some deviated from normality in terms of their
kurtosis, although this varied across items. All item distributions were dome shaped and
demonstrated some balance in their distribution.
A principal components analysis was conducted on the individual wellbeing items
(i.e., excluding the overall item) which yielded, as expected, a single component with an
eigenvalue greater than one (2.557), that accounted for 42% of the variance in the data set.

The overall item was then regressed on to the individual wellbeing items and this
yielded a solution that accounted for 45.3% of the variance in the overall item, a significant
result, where all items contributed significantly (all at alpha = 0.01, except one item which
had a p value of 0.012). As a result, the individual wellbeing items were summed to
produce a Total Wellbeing scale for use in further analysis.

Overall the wellbeing items proved to be a consistent set of items that were close to
normally distributed, that, together, adequately predicted the overall item.

Comparison with the Australian Population

Having determined the psychometric adequacy of the wellbeing item set, the results
from the ADF population were then compared with those of the general population, the
results (percent satisfied) are contained in Table 1 below.

Item ADF 07/02 09/02 02/03 04/03 07/03


Satisfaction overall 69.5 78.1 77.2 77.7 78.2 78.2
Standard of living 78.0 77.7 76.5 77.3 77.7 77.8
Health 73.3 75.4 74.9 75.8 76.0 75.2
Achievements 71.7 74.8 74.0 74.9 75.0 74.8
Personal Relationships 71.3 79.2 79.0 80.6 80.6 81.3
Links – general community 49.6 70.7 69.5 70.0 71.0 71.2
Links – military community 46.9 - - - - -

Examination of the data in the tables shows that the ADF sample rates are very close
to the general community and rates higher in terms of satisfaction with standard of living.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
364

The ADF sample rates are, however, noticeably below the general community in terms of
overall satisfaction, achievements, personal relationships, and links to the general
community.

Comparison within ADF

The wellbeing items were then compared across rank categories (PTE - LCPL, CPL
- WO1, LT - CAPT, MAJ - BRIG) and service (RAN, Army, RAAF) within the ADF by
cross-tabulation, calculation of chi-squre values and examination of standardised residuals
in each cell. This yielded the following results:

• Significant differences across service in terms of links to the general community with
RAN members more likely to indicate that they were unsatisfied with this.
• A significant difference across rank categories for overall life satisfaction with junior
ranks more likely to be dissatisfied with this and senior ranks more likely to be satisfied.
• A significant difference in satisfaction with standard of living again with junior ranks
more likely to be dissatisfied.
• A similar result for satisfaction with achievements.
• A similar result for satisfaction with personal relationships.
• A similar result for satisfaction with links to the military community.

Comparison with Organisational Markers

Total wellbeing scores were correlated with a number of organisational markers to


establish any relationship, this yielded the following correlations:

• Confidence in immediate superior – 0.17


• Confidence in Senior officers/staff – 0.23
• Confidence in Senior Defence leadership – 0.25
• I like the work in my present position – 0.26
• There are insufficient personnel in units - -0.03 (NS)
• Adequate opportunities to clear leave – 0.17
• Impact of work on family responsibilities - -0.31
• Actively looking to leave the service - -0.24
• Satisfaction with salary – 0.235
• Personal morale – 0.43
• Unit morale – 0.32

Total Wellbeing scores were then compared across categories of number of


deployments, time since last deployment, marital status and intention to leave; this yielded
the following results:

• There was no significant difference in total wellbeing over the number of deployments
the member had.
• There was a significant effect for time since deployment on wellbeing, with a general
increase in wellbeing as time since last deployment increases.
• There was a significant effect for marital status on wellbeing. Post-hoc tests (Scheffes)
indicated that the major differences were between those who were in a recognised
relationship and those who were not in a relationship.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
365

• When total wellbeing scores were compared across intention to leave categories a
significant effect was identified for intention, post hoc comparisons showed very clearly
that those intending to remain until retirement, or those who have not considered
leaving, having a higher level of wellbeing than those who have considered leaving.
When wellbeing is compared across number of deployments there was no main effect.

DISCUSSION

The results presented above indicate that the wellbeing items from the Australian
Unity Wellbeing Index are a useable measure for the ADF. The psychometric properties of
the scale constructed by the items are sound and correlates well with the single wellbeing
item (Satisfaction with overall life).

Overall wellbeing in the ADF is less than in the general population, with the main
differences in the areas of connection with the community, personal relationships, and
satisfaction with achievements. Within the ADF the main differences occurring for junior
members of the ADF in these areas and also in satisfaction with their standards of living.
This is an important result as it indicates where the priority of effort should be put to
improve wellbeing overall.

Comparison across a number of important organisational (intention to leave, time


since lst deployment) indicators indicate that wellbeing may have a causative effect on
these. Correlation analysis with other organisational markers shows that wellbeing has a
good relationship with a number of markers. The bi-directional nature of these analyses
means that there is significant scope to further examine these variables in a more complex
model of wellbeing and organisational attitudes and behaviour. There is significant scope
particularly for model testing procedures such as structural equation modelling.

These results indicate that wellbeing is a concept that can be measured with some
utility in the ADF and with immediate applicability, particularly in terms of comparison
with the general community and in terms of targeting service delivery and policies to better
meet wellbeing needs.

CONCLUSION

Wellbeing is a concept of great promise as an organisational tool. It has a clear


relationship with a key organisational behaviour (intention to leave) and its relationship
with a number of important organisational markers suggests that there is significant scope
for further analysis and model testing in particular.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
366

Why One Should Wait Before Allocating Applicants

LtCol Psych Francois J. LESCREVE24


Belgian Defense Staff
Human Resources Directorate General
Accession Policy Research & Technology Section

Abstract
An important policy issue in the design of an accession system for military personnel is
the question how and when allocation decisions are made. Two major options are
available: immediate allocation or batch classification. In the immediate allocation, an
applicant will know what occupation he or she is accepted for during the selection day.
With batch classification, a number of applicants are assessed during a certain period of
time and their allocation is decided later. This allows comparing the applicants and
assigning them in a smarter way.
It can be demonstrated that the overall recruit quality is significantly better when batch
classification is chosen. Yet, immediate allocation is the preferred method in many
countries. A major reason for this is that it seems more appropriate to give the candidates
certainty about their application without delay.
In this paper we take a closer look at the relationship between the quality of the enlisted
group of recruits and the time between assessment and allocation decision. Empirical data
is used to simulate different conditions. It is found that waiting before making allocation
decisions yields better recruit quality. In addition, the used setting shows a relationship
between time before classification and recruit quality that approximates a logarithmic
function. This indicates that even minor departures from immediate classification in favor
of batch classification can yield significant improvement in recruit quality.

Introduction
Military organizations need to enlist recruits for different trades in order to compensate
for departures. To reach this goal, recruiting and selection & classification (S&C) systems
are set up. Recruiting systems are primarily aimed at attracting high numbers of quality
people. S&C systems deal with a number of applicants characterized by varying
aptitudes, interests and other pertinent attributes on the one hand as well as a number of
vacancies on the other hand. Usually, there are a certain number of positions available for
different trades. The trades or entries often require different levels of achievement for a
number of aptitudes. During a selection phase, the different attributes of an applicant are
assessed. Together with the applicant’s preferences or interest for the different entries,
these selection measures make it possible to quantify the appropriateness to assign a
candidate to a particular trade. How the actual assignment decision is made varies from
one S&C system to another. We can roughly distinguish two main systems: the
immediate ones and the batch classification systems. In immediate systems, decisions are
typically made one at a time immediately after the assessment phase. That is, while the
applicant is still present in the selection facility. In batch classification systems a
relatively large number of applicants is processed simultaneously. By comparing the
applicants to each other before assigning them, these systems typically result in

24
Contact author at Lescreve@skynet.be

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
367

significantly better recruits for a given set of applicants and vacancies25. Yet, it is often
seen that Military organizations prefer immediate assignment systems to smarter batch
classification. This paradox is probably due to the major drawback of batch classification
systems, namely the fact that applicants need to wait a certain time before knowing the
outcome of their application. This is not very client-friendly and bears the risk that
applicants continue their quest for a job elsewhere.
This paper will focus on the relationship between the quality of the enlisted recruits and
the timing for classification. If it is confirmed that for a given applicant pool and a given
set of vacancies, we can reach better recruit quality by assigning them through batch
classification, the question arises as to how long we need to wait in order to benefit from
the increased quality? This research question is primarily of interest to continuous
recruiting systems. A recruiting system can be considered to be continuous when a person
can apply at any time and recruits are enlisted frequently, usually on a monthly basis.
Other systems, such as used for the recruitment of officers for instance, are not
continuous but annual or semi-annual. Typically, the candidates have to apply before a
certain date and batch classification is used once, after all applicants have been examined.

Method
In order to assess the influence of waiting time before classification on recruit quality, a
method was used that will be described next.
Dataset.
A sufficiently large dataset is needed to conduct this research. Since we also wanted to
include the applicants’ preferences, this limited our choice among available datasets. It
was therefore decided to aggregate data originating from different recruiting sessions.
As will become clear further, this does not result in any kind of bias for the study. The
dataset is composed of the Belgian NCO applicants who applied from 1999 to 2003
and the vacancies available to them26. The measurements for each applicant are
described in Enclosure 1. The vacancies encompass 27 different trades listed in
Table 1.
For each trade, the aptitude of the applicants was computed based upon a weighted
sum of scores. The weights vary from one trade to another. To qualify for a trade, an
applicant needs to meet certain requirements. These pertain to categorical
measurements such a medical profile and/or to metric measurements for which minima
are set. A person not meeting the requirements is given an aptitude zero for the entry.
To illustrate the differences in weights used for different entries, the following graph
shows the weights used for the entries ‘Infantry’ and ‘Air Traffic Controller’. In both
cases the weights add up to one. The meaning of the variables can be found in
Enclosure 1.

25
Lescreve, F. Improving Military Recruit Quality Through Smart Classification Technology Report of an
International Collaborative Research Sponsored by the US Office of Naval Research, October 2002
26
In total 1529 vacancies encompassing 27 trades were available for 3366 applicants who were at least eligible
for one trade.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
368

Used Weights to Compute Aptitudes


Example: Infantry & Air Traffic Control
0,45

0,40

0,35

0,30

0,25

0,20

0,15

0,10

0,05

0,00

-0,05 Infantry
ATC ENG_G KAHO PHYS Air Traffic Control
ELEC ENG_T MECH PINP

Figure 1

As was mentioned earlier, the dataset also includes the preferences of the applicants
for the entries. In Belgium, the NCO applicants are asked to express their preferences
on a scale from 99 (first choice) down to 1. They cannot give the same preference
twice. They also can give a zero preference for entries they reject and that prevents
them from being assigned to that trade.
Given the applicant’s aptitude and the preference for an entry, a payoff value is then
computed27. This value combines aptitude and preference in order to express the
overall appropriateness to assign the applicant to the particular entry. Applicants who
didn’t have a non-zero payoff for at least one entry were removed from the dataset.
These were the applicants that didn’t qualify for any entry. 3366 applicants remained.
Only the payoff, aptitude and preference scores are used in this research. The first two
scores are standardized per trade with a mean of 500 and a standard deviation of 200.
Standardized payoffs are limited to the range of 0 to 999. The preferences are
untransformed and range from 99 to 0.
Table 1 summarizes the complete dataset. The columns represent:
• Job ID: An identification number for each entry available to the
applicants;
• MOS: A ‘Military Occupation Specialty’ code;
• Job Title: A description of the entries;
• Positions: The total number of individual positions for the entry;
• Qualified: The number of applicants in the dataset meeting the
requirements for the entry and for who an aptitude score was
computed;
• Preference: The number of applicants in the dataset who are willing to be
assigned to the entry (Preference score > 0);

27
The used formula is: Payoff = Aptitude * [((Preference/99) * 0.6) + 0.4]

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
369

• Non-zero Payoff: The number of applicants in the dataset who can be assigned to
the entry. In order to be included in that number, an applicant
must meet the requirements to qualify and express a preference
> 0 for the entry;
• Ratio: The number of applicants that can be assigned to the entry per
available position.

Job Nonzero
MOS Job Title Positions Qualified Preference Ratio
ID Payoff
1 100 Aircraft mechanics Army 14 2746 559 509 36,36
2 102 Electro mechanics Army 27 2385 384 335 12,41
3 114 Signal electronics Army 45 2425 442 374 8,31
4 116 Armaments electronics Army 9 2425 316 274 30,44
5 140 Infantry 252 1730 2005 1156 4,59
6 142 Armor 84 1703 1914 1056 12,57
7 144 Artillery 99 2837 1857 1711 17,28
8 146 Engineer 46 1703 1415 788 17,13
9 150 Signal 101 2837 1347 1056 10,46
10 152 Supply (Services) 120 3072 1411 1235 10,29
Electricity Electronics Air
11 212 157 2591 478 443 2,82
Force
12 240 Air Traffic Controller 98 101 668 100 1,02
13 244 Computer operator Air Force 97 2591 759 660 6,80
14 250 Airfield defense Air Force 60 2125 1385 898 14,97
15 320 Computer operator Navy 16 1065 100 61 3,81
16 322 Sonar operator Navy 10 1065 87 54 5,40
Transport and movement
17 154 18 3072 849 759 42,17
control Army
18 326 Radio operator Navy 12 1065 84 53 4,42
19 364 Signal Navy 15 1318 172 131 8,73
20 156 Cook Army 34 3092 558 480 14,12
21 318 Electrician Navy 15 1190 108 86 5,73
22 358 Detector Navy 16 618 147 73 4,56
23 200 Aircraft mechanics Air Force 76 2746 421 384 5,05
24 202 Electro mechanics Air Force 34 2385 341 299 8,79
25 246 Administration Air Force 26 3360 749 749 28,81
26 420 Medical support personnel 39 2837 521 370 9,49
27 312 Radar maintenance Navy 9 1065 75 51 5,67
Sum 1529
Table 1
Subsets
From the described dataset, 60 subsets were drawn. The persons were randomly sorted
and then sequentially assigned to subsets of 56 persons each (adding up to 3360
persons). On the vacancy side, the available vacancies were distributed proportionally
over the 60 subsets. Since the number of vacancies per entry must be an integer, it was
ensured that rounding effects did not affect some subsets more than others in a
systematic way. The original number of vacancies per trade for each subset is given in
Enclosure 2.
Procedure
Why did we divide the original dataset into subsets? The reason is that this will help us
simulate what is of interest: how does time between classifications influence recruit
quality? Of course, time in itself is not really the point. In continuous recruiting
systems, time correlates with the number of assessed applicants. It is the increased
number of applicants classified simultaneously that is of importance to recruit quality.
From the applicants’ or organization’s point of view however, it is the time between
classifications that matters. Hence, our interest in time between classifications. To
understand the approach, consider a hypothetical S&C setting in which - by chance –
each day 56 persons are found to be eligible for at least one entry. This system would

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
370

wait till the end of the day and then use a batch classification system to assign these
applicants to the vacancies that need to be filled that day. This corresponds to running
the classification for the persons and vacancies in one subset. This procedure can be
repeated each day or, expressed more generically, subset-by-subset. A variant to this
approach would consist of classifying the applicants not each day but every two days
(or the persons and vacancies of two subsets added up). Or every week (persons and
vacancies of 5 subsets together assuming a 5 working day week). And so on. A
different S&C setting may of course ‘produce’ more or less eligible applicants per
time unit. The principle remains however: for continuous recruiting settings, batch
classification can be done more or less frequently.
It was chosen to divide the dataset in 60 parts for 60 is divisible by 1, 2, 3, 4, 5, 6, 10,
12, 15, 20, 30 and 60 with no remainder. This proved to be very convenient for this
research.
It was also decided to add the unfilled vacancies from one classification to the
vacancies for the next one since that seemed the most logical way in which continuous
S&C systems are functioning.
In summary: the independent variable for this research is the number of subsets that
are processed simultaneously and the dependent variables are the mean payoff,
aptitude and preference yielded when all 60 subsets have been processed. The payoff
value of a person for the job s/he is assigned to is the operationalized definition of
her/his quality for the organization.
Classification method
Probably the most crucial aspect of this research setting hasn’t been touched yet: the
classification method itself. We used a smart classification system developed by the
author called the ‘Psychometric Model’. This method is in use with the Belgian
Defense since 1995. Its algorithm maximizes the sum of payoffs for the applicants
assigned to the different jobs. The Psychometric Model allows lots of fine-tuning such
as setting the Defense priorities, giving coefficients to the different classes of
categorical data, setting minimum preferences to assign persons to jobs, include study
background in the computing of payoffs etc. For reasons of simplicity, all these
possibilities were not included in the current research.

Results
The following graphs show the main results of this research. In the next three graphs, the
abscissa represents the condition of the independent variable: the number of subsets
processed simultaneously. One means that the subsets were processed one by one. This
means that 60 classifications were performed to process the whole dataset. Table 2 shows
for each condition the number of subsets processed simultaneously and the number of
classifications needed to process the whole dataset.

Condition 1 2 3 4 5 6 7 8 9 10 11 12
# Subsets processed simultaneously 1 2 3 4 5 6 10 12 15 20 30 60
# Performed classifications 60 30 20 15 12 10 6 5 4 3 2 1
Table 2

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
371

The ordinates indicate the average value (payoff, aptitude or preference) for all applicants
that have been assigned in the different conditions. The values are given in Table 3. Note
that in some conditions not all 1529 available jobs were filled.

Subsets Persons Average Average Average


processed
simultaneously Assigned Payoff Aptitude Preference

1 1525 674,92 630,61 89,75


2 1528 695,71 646,77 90,88
3 1526 700,67 651,00 91,18
4 1528 704,56 653,16 91,49
5 1529 705,15 652,80 91,81
6 1528 707,70 654,91 91,97
10 1529 710,12 656,15 92,30
12 1529 710,49 655,95 92,33
15 1529 712,56 657,72 92,30
20 1529 713,87 658,57 92,61
30 1529 716,41 660,33 92,80
60 1529 719,21 662,78 93,00
Table 3

Average Payoff of Enlisted Persons


as a Function of Classification Frequency
730

720
Average Payoff of Enlisted Persons

710

700

690

680

670
1 6 12 20 30 60
Number of Subsets Processed Simultaneously

Figure 2

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
372

Average Aptitude of Enlisted Persons


as a Function of Classification Frequency
665

660
Average Aptitude of Enlisted Persons

655

650

645

640

635

630

625
1 6 12 20 30 60
Number of Subsets Processed Simultaneously

Figure 3

Average Preference of Enlisted Persons


as a Function of Classification Frequency

93,5

93,0
Average Preference of Enlisted Persons

92,5

92,0

91,5

91,0

90,5

90,0

89,5
1 6 12 20 30 60
Number of Subsets Processed Simultaneously

Figure 4

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
373

In next- rather busy – graph the average payoff is given for the different entries. What is
important to note for the moment is that the curvilinear relationship shown in figure 2
does not seem to apply to all entries.

J1
Average Payoff of Enlisted Persons per Trade J2
as a Function of Classification Frequency J3
1000 J4
J5
J6
900 J7
Average payoff of assigned applicants

J8
J9
800 J10
J11
J12
700 J13
J14
J15
600 J16
J17
J18
500
J19
J20
J21
400
J22
J23
J24
300
1 6 12 20 30 60 J25
J26
Number of subsets processed simultaneously J27

Figure 5

Discussion
To begin with, it is clear that the conditions processing larger numbers of subsets
simultaneously yield better average payoffs than the ones processing smaller numbers.
The magnitude of the difference might seem rather small upon first sight. The difference
between the first and last condition is about 45 points on a scale with an average of 500
and a standard deviation of 200. Yet, given the fact that in all conditions the same
applicant pool and the same vacancies were used along with the same eligibility rules and
the same classification tool and considering that the difference pertains to the average of
more than 1500 persons, one should realize that the effect is quite important. Put in other
words, if you would think of measures that need to be taken to yield a similar increase of
average recruit quality, such as increasing the selection ratio or improving the applicant
pool quality through recruiting actions, you most probably would conclude that waiting
some time before classifying the applicants is a very cheap and effective option.
Secondly, it is quite interesting to notice the curvilinear relationship between average
payoff and number of subsets processed simultaneously. The relationship approximates a
logarithmic function. The steepest increase in payoff occurs at the left side of the abscissa
and then flattens out.
When looking at the second and third graph, we see very similar relationships. This
means that both aptitude and preference benefit from classification in larger groups.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
374

What can we infer from these results? As indicated earlier, we used subsets to simulate
the time between successive classifications. As time goes by, the number of eligible
applicants increases as does the number of vacancies that need to be filled (assuming a
proportional distribution of vacancies over a large time period). Aggregating different
subsets simulates this. From the obtained results it is therefore possible to conclude that
increasing the time between classifications yields better recruit quality. This is due to the
fact that the number of degrees of freedom to assign persons to jobs increases. On the
condition that a classification tool is used that capitalizes on that number of degrees of
freedom to produce better results, it can be expected that the outcome improves when that
number grows. Of course the average payoff obtained by classification is constrained by
the payoff distribution in the processed subsets. It is therefore normal that the effect of
increased degrees of freedom flattens.
If we now can accept that recruit quality does improve when we wait before performing
batch classification, it is very interesting to notice – at least in this particular case - that a
adequate batch classification system doesn’t need to wait very long before yielding
significantly better results. This finding is of great practical consequence. Since one of the
major reasons not to perform batch classification is related to the fact that the
organizations are reluctant to let the applicants wait before knowing the outcome of their
application, it is good to know that they shouldn’t wait for long!
It has to be said that the found curvilinear relationship isn’t there for each individual
entry. Figure 5 shows some entries for which there is no improvement at all or where
there are conditions processing more subsets simultaneously yielding poorer results than
conditions with fewer subsets. When analyzing the data it appears that these entries either
have very low selection ratios (for instance Job 12: 100 persons with non-zero payoff for
98 vacancies) or very low numbers of vacancies (for instance Job 16 with only 10
vacancies). Further research should take a closer look at the reasons why some trades do
not benefit from classification in larger groups.
A major difficulty in S&C research is to control all parameters. There are a vast number
of elements that condition a particular S&C situation and that practically prevents us to
design useful theoretical models. To name just a few of these parameters:
• The number of entries;
• The number of eligible applicants per entry;
• The number of positions per entry;
• The interactions between eligibility for different entries;
• The way in which new vacancies are added to the S&C system;
• The way in which new candidates are added to the S&C system;
• The way in which enlistment dates are managed;
• The used assignment decision process (classification method).
The present S&C research is not an exception. In order to be able to conduct it, a number
of options needed to be taken. These options may have influenced the results and may
compromise the conclusions to a certain extent.
One of the options we took was to add the unfilled vacancies of one classification to the
vacancies of the following one. This seemed to reflect what most S&C system managers
would do. If this were not done, it would have resulted in having a number of vacancies
remaining unfilled. This would mainly have an adverse impact on the conditions where
one or a few subsets are processed simultaneously. As Table 3 shows, even with adding
the unfilled vacancies to the next classification, not all conditions are able to fill all 1529

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
375

vacancies. In fact, the problem of unfilled vacancies only occurs within the first six
conditions. That is when only a few subsets are processed simultaneously.
Another option we took was not to transfer the applicants that were not assigned to a job
in one classification to the next one. That decision might be a bit more controversial.
Transferring all non-assigned applicants to the next classification increases the number
of degrees of freedom for that classification. In theory, this should result in a better
outcome. The effect of added degrees of freedom may however be tempered by the fact
that the added applicants are expected to be of lesser quality. This is so since the better
applicants were assigned to a job in the previous classification. For our research this
would probably mean that the found differences in recruit quality between the conditions
would be smaller if we had transferred the non-assigned applicants. The main reason
why we decided not to transfer them is that by transferring them, we also increase the
time between their assessment and the moment they learn about the result of their
application. In other words, we would falsify the research design by decreasing the
differences between the different conditions!
An important option relates to the used classification system. As mentioned earlier, the
classification system we used - the Belgian ‘Psychometric Model’ - capitalizes on the
number of degrees of freedom available in the S&C setting to produce a high quality
classification. It is quite obvious that less powerful classification methods will yield less
marked results. The right question here is why S&C managers would prefer to stick to
less powerful methods?
In our setting, we used subsets of equal length. In the first condition each classification
was done for 56 persons at a time, in the second one 112 persons were processed
simultaneously etc. It is however quite unlikely that in practice each selection day or
period would produce exactly the same amount of persons eligible for at least one trade.
It is not clear what the influence of unequal length of subsets would be but it seems
likely that conditions processing larger numbers are less subject to non-representative
variance.
The present research effort only looks at one side of the problem: the influence of waiting
time on recruit quality. There is of course another one: the influence of waiting time on
applicant behavior. Daily practice indicates that when applicants need to wait after their
assessment to know the outcome of their application, some of them loose their interest or
continue their search for a job elsewhere. This is bad news for the organization. It is
therefore important to study the interaction between both sides: the increased recruit
quality obtained when postponing batch classification and the loss of applicants when
doing so. As was mentioned earlier, the complexity of most S&C systems makes it quite
impossible to give general advice. However, given the importance of recruit quality for
the Military, this should be studied! Another aspect that needs to be looked closer at is
related to the fact that smart batch classification systems are able to respect the
applicants’ preferences better when processing larger groups. This means that by waiting
some time longer, the applicants that are assigned to jobs are more likely to get a trade
they like28. This might have a positive influence on early turnover and other relevant
aspects of training.
Further research should also have a closer look at the left end of the graphs we presented.
In the first condition of our research design, we classified subsets one at a time. Each
subset represented 56 persons. Given the quasi-logarithmic shape of the curve it would be

28
On the obvious condition that the classification system includes applicant preferences.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
376

interesting to look at even smaller subsets. In the most extreme condition a subset would
only count one person. The batch classification method would then in fact be similar to
immediate classification. Such research design would most probably yield more evidence
as to why immediate classification should be banned from S&C practice.

Conclusion
We found improved recruit quality in conditions where more subsets were processed
simultaneously. This indicates that performing batch classification on larger groups is
beneficial to the quality of recruits. In practice this would mean that given a set of
applicants, vacancies and eligibility rules, waiting some time before assigning applicants
to jobs is beneficial to the Military organization.
The found curvilinear relationship indicates that significant improvement of recruit
quality can be obtained without having to wait a long time before classification.
Further research is needed to:
–Confirm results in different settings;
–Understand the mechanisms causing differential impact on different entries;
–Model the impact of waiting time on applicant behavior and relate this to the obtained
improvement in quality.

References
• Alley, W. E., Recent advances in classification theory and practice. In M. G. Rumsey, C.
B. Walker & J. Harris (Eds)., Personnel selection and classification. Hillsdale, New
Jersey, Lawrence Erlbaum Associates, Publishers. 1994.
• Burke, E., Kokorian, A., Lescrève, F., Martin, C., Van Raay, P. & Weber, W., Computer
based assessment: a NATO survey, International Journal of Selection and Assessment,
1995
• Darby, M., Grobman, J., Skinner, J. & Looper, L., The Generic Assignment Test and
Evaluation Simulator. Human Resources Directorate, Manpower and Personnel Research
Division, Brooks AFB, 1996.
• Green, B. & Mavor, A. (Eds), Modeling Cost and Performance for Military Enlistment.
Washington D.C., National Academy Press. 1994.
• Hardinge, N. M., Selection of Military Staff. In International Handbook of Selection and
Assessment., Edited by Neil Anderson and Peter Herriot, Wiley, 1997. p 177-178.
• Keenan, T,. (1997) Selection for Potential: The Case of Graduate Recruitment. in
International Handbook of Selection and Assessment., Edited by Neil Anderson and Peter
Herriot, Wiley, 1997. p. 510.
• Kroeker, L. & Rafacz, B., Classification and assignment within pride (CLASP): a recruit
assignment model., US Navy Personnel Research and Development Center, San Diego,
CA, 1983.
• Lawton, D., A review of the British Army Potential Officer Selection System In
Proceedings of the 36th Annual Conference of the International Military Testing
Association, 1994.
• Lescrève, F., A Psychometric Model for Selection and Assignment of Belgian NCO’s in
Proceedings of the 35th annual conference of the Military Testing Association. US Coast
Guard, . 1993 p. 527-533.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
377

• Lescrève, F., The Selection of Belgian NCO’s: The Psychometric model goes operational.
in Proceedings of the 37th annual conference of the International Military Testing
Association. Canadian Forces Personnel Applied Research Unit. 1995, p. 497-502.
• Lescrève, F., The use of neural networks as an alternative to multiple regressions and
subject matter experts in the prediction of training outcomes., Paper presented at the
International Applied Military Psychology Symposium, Lisboa, 1995.
• Lescrève, F., The Psychometric model for the selection of N.C.O.: a statistical review.
International Study Program in Statistics, Catholic university of LEUVEN, 1996
• Lescrève, F., The determination of a cut-off score for the intellectual potential. Center for
Recruitment and Selection: Technical Report 1997-3.
• Lescrève, F., Data modeling and processing for batch classification systems. in
Proceedings of the 39th Annual Conference of the International Military Testing
Association, Sidney, 1997
• Lescrève F., Immediate assessment of batch classification quality. In Proceedings of the
37th annual conference of the International Military Testing Association., 1998, Internet:
www.internationalmta.org
• Lescrève F., Equating distributions of aptitude estimates for classification purposes. In
Proceedings of the 40th annual conference of the International Military Testing
Association., 2001.
• Lescrève F., Why Smart Classification Does Matter. Proceedings of the 41st annual
conference of the International Military Testing Association., 200229
• Lescrève, F. Improving Military Recruit Quality Through Smart Classification Technology
Report of an International Collaborative Research Sponsored by the US Office of Naval
Research, October 2002
• Robertson I., Callinan M. & Bartram D. Organizational Effectiveness: The role of
Psychology New York, John Wiley & Sons Ltd, 2002
• Stevens S.S., Mathematics, measurement and psychophysics. In S.S. Stevens (Ed.)
Handbook of experimental psychology. New York: Wiley, (1951). p. 1-49

29
The paper was accidentally omitted in the proceedings. An electronic copy can be obtained at
Lescreve@skynet.be

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
378

Enclosure 1

Metric Variables Included in the Dataset


Variable Name Description
ATC Air Traffic Controller suitability score
ELEC Test score for Electricity
ENG-G General English comprehension30
ENG-T Technical English comprehension
KAHO Personality score
MECH Composite score for Mechanics
PHYS Physical fitness score
PINP General intelligence score

Categorical Variables Included in the Dataset

Variable Name Description


FAC_A Medical profile: audio
FAC_C Medical profile: color perception
FAC_E Medical profile: emotional stability
FAC_G Medical profile: navy
FAC_I Medical profile: lower limbs
FAC_K Medical profile: navy
FAC_M Medical profile: mantal capacity
FAC_O Medical profile navy
FAC_P Medical profile: general
FAC_S Medical profile: upper limbs
FAC_V Medical profile: visual acuity
FAC_Y Medical profile: navy
IN_COMB Interest in combat activities
IN_GROUP Interest in group activities
IN_OUTD Interest in outdoor activities
IN_SPORT Interest in sport activities
IN_TECH Interest in technical activities

30
English is not an national language in Belgium but is considered to be of great importance for certain trades.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
379

Enclosure 2

Vacancies per trade (Columns) for each Subset (Rows)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 SUM
S1 0 1 1 0 4 2 2 1 2 2 2 2 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 25
S2 0 0 1 0 4 1 1 1 1 2 3 1 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 23
S3 0 1 1 0 4 2 2 1 2 2 2 2 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 25
S4 0 0 1 0 4 1 1 1 1 2 3 1 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 23
S5 1 1 1 0 4 1 2 1 2 2 2 2 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 25
S6 0 0 1 0 4 1 1 1 1 2 3 1 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 23
S7 1 1 1 0 4 1 2 1 2 2 2 2 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 25
S8 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 0 0 0 0 1 0 1 1 1 1 1 0 23
S9 1 1 1 0 4 1 2 1 2 2 2 2 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 25
S10 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 0 0 0 0 1 0 1 1 1 1 1 0 23
S11 1 1 1 0 4 1 2 1 2 2 2 2 1 1 1 0 0 0 1 1 0 0 1 0 0 0 1 26
S12 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 0 0 1 0 1 0 1 1 1 1 1 0 24
S13 1 1 1 0 4 1 2 1 2 2 2 2 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 25
S14 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 0 0 1 0 1 0 1 1 1 1 1 0 24
S15 1 1 1 0 4 1 2 1 2 2 2 2 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 25
S16 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 0 0 1 0 1 0 1 1 1 1 1 0 24
S17 1 1 1 0 4 2 2 1 2 2 2 2 2 1 1 0 0 0 1 1 0 0 1 0 0 0 0 27
S18 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 1 0 1 0 1 0 1 1 1 1 1 0 25
S19 1 1 0 0 4 2 2 1 2 2 2 2 2 1 1 0 0 0 1 0 0 0 1 0 0 0 0 25
S20 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 1 0 1 0 1 0 1 1 1 1 1 0 25
S21 1 1 0 0 4 2 2 1 2 2 2 2 2 1 1 0 0 0 1 0 0 0 1 0 0 0 0 25
S22 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 1 0 1 0 1 0 1 1 1 1 1 0 25
S23 1 1 0 0 4 2 2 1 2 2 2 2 2 1 1 0 0 0 1 0 1 0 1 0 0 0 0 26
S24 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 1 0 1 0 1 0 1 1 1 1 1 0 25
S25 1 1 0 1 4 2 2 1 2 2 2 2 2 1 1 0 0 0 0 0 1 0 2 0 0 0 0 27
S26 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 1 0 1 0 1 0 1 1 1 1 1 0 25
S27 1 1 0 1 5 2 2 1 2 2 2 2 2 1 1 0 0 0 0 0 1 0 2 0 0 1 0 29
S28 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 1 0 1 0 1 0 1 1 1 1 1 0 25
S29 1 0 0 1 5 2 2 1 2 2 2 2 2 1 1 0 0 0 0 0 1 0 2 0 0 1 0 28
S30 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 1 0 1 0 1 0 1 1 1 1 1 0 25
S31 1 0 0 1 5 2 2 1 2 2 2 2 1 1 1 0 0 0 0 0 1 0 2 0 0 1 0 27
S32 0 0 1 0 4 1 1 0 1 2 3 1 2 1 0 1 0 1 0 1 0 1 1 1 1 1 0 25

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
380

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 SUM
S33 0 0 0 1 5 2 2 1 2 2 2 2 1 1 1 0 0 0 0 0 1 0 2 1 0 1 0 27
S34 0 0 1 0 4 1 1 0 1 2 3 2 2 1 0 1 0 1 0 1 0 1 1 1 1 1 0 26
S35 0 1 0 1 5 2 2 1 2 2 2 2 1 1 1 0 1 0 0 0 1 0 2 1 0 1 0 29
S36 0 0 1 0 4 1 1 1 2 2 3 2 2 1 0 1 0 0 0 1 0 1 1 1 1 1 0 27
S37 0 1 0 1 5 2 2 1 2 2 2 2 1 1 1 0 1 0 0 0 1 0 2 1 0 1 0 29
S38 0 0 1 0 4 1 2 1 2 2 3 2 2 1 0 0 0 0 0 1 0 1 1 1 1 1 0 27
S39 0 1 0 1 5 2 2 1 2 2 2 2 1 1 1 0 1 0 0 0 1 0 2 1 0 1 0 29
S40 0 0 1 0 4 1 2 1 2 2 3 2 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 26
S41 0 1 0 1 5 2 2 1 2 2 2 2 1 1 1 0 1 0 0 0 1 0 2 0 0 1 0 28
S42 0 0 1 0 4 1 2 1 2 2 3 2 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 26
S43 0 1 0 0 5 2 2 1 2 2 3 2 1 1 0 0 1 0 0 0 1 0 2 0 0 1 0 27
S44 0 0 1 0 4 1 2 1 2 2 3 2 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 26
S45 0 1 0 0 5 2 2 1 2 2 3 2 1 1 0 0 1 0 0 0 1 0 2 0 0 0 0 26
S46 0 0 1 0 4 1 2 1 2 2 3 2 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 26
S47 0 1 0 0 5 2 2 1 2 2 3 2 1 1 0 0 1 0 0 0 1 0 2 0 0 0 0 26
S48 0 0 1 0 4 1 2 1 2 2 3 2 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 26
S49 0 1 1 0 5 2 2 1 2 2 3 2 1 1 0 0 1 0 0 0 1 0 2 0 0 0 0 27
S50 0 0 1 0 4 1 2 1 2 2 3 1 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 25
S51 0 1 1 0 4 2 2 1 2 2 3 2 1 1 0 0 1 0 0 0 1 0 2 0 0 0 0 26
S52 0 0 1 0 4 1 2 1 2 2 3 1 2 1 0 0 0 0 0 1 0 0 1 1 1 1 0 25
S53 0 1 1 0 4 2 2 1 2 2 3 2 1 1 0 0 1 0 0 0 0 0 2 0 0 0 0 25
S54 0 0 1 0 4 1 2 1 2 2 3 1 2 1 0 0 0 0 0 1 0 0 1 1 0 1 0 24
S55 0 1 1 0 4 2 2 1 2 2 3 2 1 1 0 0 1 0 1 0 0 0 2 0 0 0 1 27
S56 0 0 1 0 4 1 1 1 2 2 3 1 2 1 0 0 0 0 0 1 0 0 1 1 0 1 0 23
S57 0 1 1 0 4 2 2 1 2 2 2 2 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 25
S58 0 0 1 0 4 1 1 1 1 2 3 1 2 1 0 0 0 0 0 1 0 0 1 1 0 1 0 22
S59 0 1 1 0 4 2 2 1 2 2 2 2 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 25
S60 0 0 1 0 4 1 1 1 1 2 3 1 2 1 0 0 0 0 0 1 0 0 1 1 0 1 0 22
Sum of
14 27 45 9 252 84 99 46 101 120 157 98 97 60 16 10 18 12 15 34 15 16 76 34 26 39 9 1529
vacancies

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
381

From Attraction to Rejection: A Qualitative Research on Applicant Withdrawal

Bert Schreurs31

ABSTRACT

This study examined why prospective and real applicants for the Belgian military decide to
withdraw from the hiring process. We inventoried reasons for applicant withdrawal by gathering
qualitative data through focus groups and in-depth interviews (face-to-face and telephonic) with
prospective applicants, applicants withdrawing from the hiring process, applicants completing the
selection procedure, trainees, and employees involved in the hiring process. Because it is
generally accepted that applicant reactions to recruitment and selection procedures may influence
whether applicants pursue job offers, this factor was examined in more detail. Results indicated
that one of the main reasons for applicant withdrawal was that the military had become less
attractive relative to other options; that withdrawals were frequently influenced by the opinions of
significant others (parents, partner, and peers) on the military; and that based on the information
they had received in the recruiting station, withdrawals had serious doubts about whether they
would match with the organization. Inconsistent with previous research, we found that
withdrawals were generally unaffected by early recruitment practices. Suggestions for
strengthening organizational recruitment programs and for directing further research are
discussed.

31
Belgian Ministry of Defence
Human Resources Directorate General
Accession Policy – Research & Technology
Bruynstraat 1, B-1120 Brussels (Neder-Over-Heembeek)
bert.schreurs@mil.be or schreurs.bert@skynet.be
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
382

From Attraction to Rejection: A Qualitative Research on Applicant Withdrawal

It is recognized that the recruitment process consists of multiple stages or phases. The first stage
involves the identification and generation of applicants (from the organization’s perspective) or
job opportunities (from the individual’s perspective). During the next stage, applicants become
selectees if they pass the selection tests. At the last step, jobs are offered to the persons with the
highest ranking. At every moment applicants can decide to self-select out of the recruitment
process. As the term selection has been reserved for the processes used by organizations in
hiring, self-selection has been the term used to refer to the individual’s selection decision (Ryan,
Sacco, McFarland, & Kriska, 2000). Until now, research has overlooked applicant withdrawal
that takes place in an early stage of the hiring process (Barber, 1998). In this research, we
focused on applicants who decide to withdraw from the hiring process after the first hurdle of the
selection process. These applicants passed the initial screening test at the career office, but did
not show up at the selection center for their physical, medical and psychological screening.
The study of self-selection is important for a number of reasons. Firstly, applicants’ decisions to
withdraw from the selection process may affect the size and quality of the applicant pool (Barber
& Roehling, 1993). If the organization’s top choices withdraw, this will lead to a reduced utility
of the hiring system (Murphy, 1986). Conversely, self-selection can have positive results for the
organization in terms of reduced turnover, and higher employee satisfaction, commitment, and
performance (see Wanous, 1992). Secondly, while there are plenty of studies examining why job
offers are accepted, until today it is still unclear why job offers are rejected (Turban, Eyring, &
Campion, 1993). Thirdly, if the number of qualified women and minorities advancing through
the selection process decreases, this may affect adverse impact statistics and the ability to meet
diversity goals (Schmit & Ryan, 1997). Thus, understanding the causes of applicant withdrawal
is important.
Research on Applicant Withdrawal
Although applicant withdrawal is often discussed in the literature on recruitment and job choice
(e.g., Rynes, 1991, 1993), the empirical research has largely been directed at rejection of job
offers and not at withdrawal behavior earlier in the process (Barber, 1998; Schmit & Ryan, 1997).
As a result, relatively little is known about applicants’ decisions to withdraw from an
organization’s selection process prior to the point of a job offer (Schmit & Ryan, 1997). In one
of the earlier studies in this area, it was found that the time delay between application and the
next step in the selection process was related to applicant withdrawal from civil service position
selection processes (Arvey, Gordon, Massengill, & Mussio, 1975). The strongly negative effects
of recruitment delays were also observed by Rynes, Bretz, and Gerhart (1991), particularly
among male students with higher grade point averages and greater job search success. More
recently, Schmit and Ryan (1997) examined the role of test-taking attitudes and racial differences
in the decisions of applicants to withdraw from the selection process. They found small effects of
comparative anxiety, motivation, and literacy scales on withdrawal behavior, and small race
differences on test attitude scales. Applicant withdrawal or self-selection has also been examined
as a theoretical rationale for the effects of realistic job previews (RJPs): “Applicants who are not
likely to be satisfied with the job will not accept job offers, and those who do accept will
therefore be more likely to remain” (Barber, 1998, p. 85). Several studies found support for this
theory in that exposition to RJPs was associated with higher job rejection rates (Meglino, DeNisi,
Youngblood, & Williams, 1988; Premack & Wanous, 1985; Suszko & Breaugh, 1986; Wiesner,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
383

Saks, & Summers, 1991). Bretz and Judge (1998) examined whether self-selection based on job
expectation information may be adverse from the organization’s perspective. That is, whether the
best qualified applicants are most likely to self-select out when presented with negative
information about the organization. The results of this study yielded mixed support for the
adverse self-selection hypothesis.
Correlates of Applicant Withdrawal
Process perceptions. One possible correlate of applicant withdrawal that has elicited a mass of
research relates to the way applicants perceive and react to hiring processes. The bulk of this
research deals with applicant reactions to initial screening interviews (e.g., Goltz &
Giannantonio, 1995; Harris & Fink, 1987; Liden & Parsons, 1986; Maurer, Howe, & Lee, 1992;
Powell, 1984, 1991; Rynes, 1991; Taylor & Bergmann, 1987; Turban, 2001; Turban &
Dougherty, 1992). Only few studies have examined applicant reactions to later recruitment
events, such as site visits (Rynes et al., 1991; Taylor & Bergmann, 1987; Turban, Campion, &
Eyring, 1995), or to elements of administrative procedures, such as time lags and delays (Arvey
et al., 1975; Rynes et al., 1991; Taylor & Bergmann, 1987). The primary mechanism through
which hiring practices are expected to influence applicants’ reactions is signaling. Based on
propositions from signaling theory (Spence, 1973, 1974), it is suggested that because applicants
at early stages of the recruitment process have incomplete information about organizations they
make inferences about the attractiveness of the job or their probability of receiving a job offer
based on their recruitment experiences (Breaugh, 1992; Rynes, 1991), and that these inferences
are directly related to applicants’ decisions to pursue employment opportunities (Barber, 1998).
Unfortunately, only rarely applicant perceptions of recruitment activities are connected to
behavioral responses, such as rejecting a job offer or dropping out of the hiring process (for
exceptions see Ryan et al., 2000; Schreurs et al., 2003). Recently, an increasing number of
studies in this area has concentrated on applicant reactions to numerous selection devices such as
drug testing, honesty testing, computerized secretarial tests, bio-data, cognitive ability tests,
work-sample tests, Assessment Centers (e.g., Crant & Bateman, 1990; Iles & Robertson, 1997;
Macan, Avedon, Paese, & Smith, 1994; Schmitt, Gilliland, Landis, & Devine, 1993; Smither,
Reilly, Millsap, Pearlman, & Stoffey, 1993; Steiner & Gilliland, 1996). Several models have
been proposed to account for applicants’ reactions to selection procedures (for an overview see
Anderson, Born, & Cunningham-Snell, 2001). For instance, Schuler, Farr, & Smith (1993)
postulate that five components influence the perceived acceptability of selection: (1) the presence
of job and organizational relevant information, (2) participation by the applicant in the
development and execution of the selection process, (3) transparency of the assessment so that
applicants understand the objectives of evaluation process and its relevance to organizational
requirements, (4) the provision of feedback with appropriate content and form, and (5) a dynamic
personal relationship between the applicant and assessor. Derous and De Witte (2001) put
forward six components: (1) provision of general information on the job opening, (2) active
participation of applicants in the selection programme, (3) creation of transparency of testing, (4)
provision of feedback, (5) guarantee of objectivity in selection through both a professional
approach and equal treatment of candidates, and (6) assurance of human treatment and respect for
privacy. Anderson and Ostroff (1997) proposed a model of ‘Socialization Impact’. It is an
empirical testable, five-domain framework covering information provision, preference impact,
expectation impact, attitudinal impact, and behavioral impact. This model closely fits the above
mentioned signaling theory.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
384

Social influence. People are unlikely to make organizational-relevant choices in a social vacuum.
Yet, social influences were ignored in research on organizational choice for a long time. Kilduff
(1990) noticed that “decision-making research has been generally silent concerning social
influences on choices” (p. 270-271) and that “a good example of scholarly neglect of social
influences on behavior occurs in the area of organizational choice” (p. 271). In his research on
the interpersonal structure of decision making, Kilduff found the social network influenced
individuals’ choices of organizations to interview with such that pairs of students who were either
friends or who perceived each other as similar tended to make similar organizational choices,
even if they had different academic concentrations and different job preferences. Similarly, there
is a lot of evidence indicating that prospective applicants are more likely to acquire information
about job vacancies through informal networks of friends, family and acquaintances than through
official sources such as advertisements or employment offices (Granovetter, 1974; Reynolds,
1951; Rynes et al., 1991; Schwab, Rynes, & Aldag, 1987). Liden and Parsons (1986) were
among the first to suggest that parents and friends may have an important influence on job
acceptance among job applicants. They found that these reference groups had even a larger
impact on job acceptance intentions than general job affect. More recently, Turban (2001) found
that the social context was related to potential applicants’ organizational attractiveness. More
specifically, he found that perceptions of university personnel of a firm’s presence on campus and
image as an employer were positively related to college students’ attraction to that firm. Legree
et al. (2000) surveyed 2,731 young men and their parents about their attitudes and intentions
toward the military to understand factors associated with military enlistment. The results from
this study indicated that youth perceptions of parental attitudes toward the military significantly
correlated with stated enlistment propensity, which predicted actual enlistment. Surprisingly,
youth perceptions of parental attitudes were often inaccurate. With regard to applicant
withdrawal, Ryan and McFarland (1997), and Ryan et al. (2000) found that family and friend
support for pursuing a particular job had a significant relation to self-selection decisions.
Applicants who self-selected out felt their families were less supportive for their careers. In their
study, Schmitt and Ryan (1997) observed that more than 10% of applicants withdrew because
they felt that the selection process or the job interfered with family obligations or with how the
job was viewed by family members. There exists a clear parallel between these findings and
recent developments in research on career choice. The social cognitive career theory (Lent,
Brown, & Hackett, 1994), for instance, emphasizes that besides person and behavioral variables,
contextual variables such as social supports and social barriers play a key role in the career choice
process.
Employment alternatives. Applicants may withdraw from the hiring process because the job
opportunity has become less attractive to them relative to other options (Barber, 1998). Schmit
and Ryan (1997) found that a large portion of those withdrawing from the hiring process did so
because of perceived employment alternatives. In some cases, withdrawals believed they could
get a better job or already had taken another offer. In other cases, one’s current job was seen as
the better alternative. Ryan et al. (2000) also found that other alternatives were a major reason
for withdrawing. These results are consistent with findings on the role of perceived employment
alternatives in turnover behavior. Turnover experts long have argued that job opportunities may
induce even satisfied employees into surrendering their current job (e.g., March & Simon, 1958;
Mobley, Griffeth, Hand, & Meglino, 1979; Steel, 1996; Steel & Griffeth, 1989).
Need to relocate. According to Noe and Barber (1993), relocation may have negative influences
on one’s non-work life, requiring adjustments in housing, education, friendships, and activities by
the relocated individuals and their families. Research on geographic boundaries in recruitment

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
385

suggest that applicants ‘rule out’ jobs located outside their preferred geographic area, and that the
importance of location is not limited to low-level employees (e.g., Barber & Roehling, 1993;
Osborn, 1993; Rynes & Lawler, 1983). In a related vein, research on job pursuit has indicated
that the need to relocate plays an important role in decisions about accepting job offers (Brett,
1982; Brett & Reilly, 1988; Gould & Penley, 1985; Noe & Barber, 1993; Noe, Steffy, & Barber,
1988). Ryan et al. (2000) found that applicants withdrawing from the hiring process expressed a
greater need to relocate than those who continued in the process.
Objective factors: The role of job attributes. There is ample evidence suggesting that objective
job attributes, such as pay, working conditions, nature of the work, influence applicant job pursuit
and job acceptance (for a review see Turban et al., 1993). Evidence from field studies on RJPs
suggests that applicants are more likely to reject jobs when presented with negative information
on the job (e.g., Premack & Wanous, 1985; Suszko & Breaugh, 1986; Wiesner et al., 1991).
Ryan et al. (2000) examined the relationship between withdrawing from a selection process and
job attribute perceptions. Contrary to expectations, perceptions of job attributes were unrelated to
withdrawal, and job attributes were generally positive. From these findings, she concluded that
screening of jobs on attributes occurs prior to application. This conclusion has important
implications for the military that has a tradition of informing prospective applicants on
organizational characteristics and career possibilities within the military prior to application.
Subjective factors: ‘Fit’. Schmit and Ryan (1997) found that a number of applicants withdrew
because of perceptions of a lack of job and organization fit. Some were of the opinion that the
job was not right for them; others argued – rightfully or wrongfully – that they did not have the
required qualifications for the job. Previous research (see Kristof, 1996 for a review) has
repeatedly demonstrated that applicants are more attracted to organizations that best fit their
personal characteristics. That is, applicants are more attracted to organizations that best fit their
individual values (e.g., Cable & Judge, 1996; Chatman, 1989, 1991; Judge & Bretz, 1992; Judge
& Cable, 1997; O’Reilly, Chatman, & Caldwell, 1991; Posner, 1992), goals (e.g., Pervin, 1989;
Vancouver, Millsap, & Peters, 1994; Vancouver & Schmitt, 1991; Witt & Nye, 1992), needs
(Bretz, Ash, & Dreher, 1989; Bretz & Judge, 1994; Cable & Judge, 1994; Turban & Keon, 1993),
and personality (Burke & Deszca, 1982; Slaughter et al., 2001; Tom, 1971). Although
organizational attraction and self-selection are not synonymous (Wanous and Colella, 1989), it is
reasonable to assume that fit perceptions will also have an influence on self-selection.
Commitment to obtaining the job. Results from previous research suggest that career
commitment, or motivation to work in a particular profession, has a strong negative relation with
intentions to withdraw from a career (Blau, 1985; Carson & Bedeian, 1994). Utilizing social
identity theory, Mael and Ashforth (1995) demonstrated that identification with the military may
occur even prior to enlistment, and that this sense of professional identity relates negatively to
turnover soon after hire. Therefore, it can be expected that individuals who are more committed
to obtaining the job are more likely to remain in the process (Ryan et al., 2000).
Impression of the organization. According to Barber (1998) “real-world applicants do not start
out as “blank slates” from a recruitment standpoint; rather, they often have some impression of
employing organizations even before they are exposed to recruitment materials. These general
impressions have been referred to as organizational images and are expected to be related to the
organization’s ability to attract applicants (e.g., Fombrun & Shanley, 1990; Stigler, 1962)” (p.
32). Gatewood, Gowan, and Lautenschlager (1993) found that an applicant’s decision to pursue
contact with the organization was influenced by its corporate and recruitment image. Turban and
Greening (1997) demonstrated that corporate social performance (i.e., the organization’s
tendency to act responsible in dealing with employees, customers, and the community) is related

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
386

to organizational attraction. Perceptions of the organization therefore not only refer to applicants’
perceptions but also take into account how applicants think the community perceives the
organization (Ryan et al., 2000).
Present Study
This study examined why applicants for the Belgian military decide to withdraw from the hiring
process. We inventoried reasons for applicant withdrawal by gathering qualitative data through
focus groups and in-depth interviews (face-to-face and telephonic) with prospective applicants,
applicants withdrawing from the hiring process, applicants completing the selection procedure,
trainees, and employees involved in the hiring process. Because it is generally accepted that
applicant reactions to recruitment and selection procedures may influence whether applicants
pursue job offers, this factor was examined in more detail. Following research questions were
focus of this research:
Research Question 1. How do prospective and real applicants evaluate and react to the
recruitment and selection process of the Belgian military?
Research Question 2. How can the organizational entry process be changed in order to
raise the number of applicants and to cut back the voluntary
withdrawal rate before, during, and after the selection process?

Method
Procedure and Sample
To join the Belgian military prospects are required to visit a military career office (one in every
province) for a preliminary information session on military life and career possibilities prior to
application. If prospects still want to enter the organization after this initial preview, they are
invited to fill out the application form and to take a cognitive screening test at the same career
office. Within a week after their application, applicants who succeeded are invited to take the
remaining tests (medical, physical and psychological) at the central selection center in Brussels.
According to current selection rules, selectees should be incorporated within one month after
application. In practice, however, this objective is not always reached.
Firstly, we conducted face-to-face interviews with prospects (N = 35) after their information
session with a career counselor at the career office. Next, we conducted telephonic interviews
with applicants who never showed up at the selection center despite their appointment (N = 200).
Approximately 10% of all applicants voluntarily withdraw from the selection process after
application. Attempts were made to contact each individual who self-selected out. Thirdly, we
contacted a small sample of applicants who completed the selection process, both applicants who
succeeded (N = 25) and applicants who failed (N = 25). In addition, we organized three focus
groups with newcomers who had just begun their initial military training, and one with
employees involved in the hiring process.
Measures
Prospects. The primary purpose of the face-to-face interview at the career office was to find out
how potential applicants had experienced these early recruitment activities. The interview
included eight questions. The first question, a warm-up question, asked about how the visitor
learned of the military career offices. The next two questions were open-ended questions asking
prospects respectively for positive and negative elements of the visit. The last five questions
asked for specific recruitment experiences: (a) whether the prospect was satisfied with the content
of the information he/she had received, (b) whether the prospect was satisfied with the amount of

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
387

information he/she had received, (c) whether the career counselor had an interest in the prospect’s
questions and problems, (d) whether the prospect had been able to actively participate in the
information session, and (e) whether the career counselor had tried to sell the prospect a job. All
five questions were followed by open-ended probes to assess reasons for the response. Interviews
were conducted by the author and two undergraduate assistants; they typically lasted 10-15
minutes.
Withdrawals. Applicants who withdrew from the selection process were contacted by telephone.
The primary purpose of the telephonic interview was to determine why applicants withdrew from
the selection process before taking the tests at the selection center. Four attempts were made to
contact each withdrawal at a different time of day. The first question was an open-ended question
asking for the main reason why the respondent chose not to continue in the selection process.
The second question was also an open-ended question asking for additional reasons. The
remaining questions were designed to assess certain specific reasons for withdrawal: (a) treatment
by career office personnel, (b) perceptions of test fairness, (c) handling of application, (d) test
anxiety, (e) social influence, (f) fit perceptions, and (g) employment alternatives. Positive
responses were followed by open-ended probes to assess reasons for the response. Interviews
were conducted by the author and four undergraduate assistants; they typically lasted 5-10
minutes.
Successful and unsuccessful applicants. Both successful and unsuccessful applicants who
completed the selection process were contacted by telephone. The primary purpose of the
telephonic interview was to find out how applicants had experienced the selection procedure.
The interview included eight questions. The first two questions were open-ended questions
asking applicants respectively for positive and negative elements of the selection encounter. The
third questions asked whether the applicant was satisfied with the selection procedure in general.
This question was always followed by an open-ended probe to assess the reason for the response.
The next four questions asked for specific selection experiences: (a) respect for privacy, (b)
practical organization of selection procedure, (c) transparency of testing, and (d) whether they
had been in the position to demonstrate their potential. The final question asked respondents what
they would change to the selection procedure if they had the chance to do so. Interviews were
conducted by the author and two undergraduate assistants; they typically lasted 5-10 minutes.
Trainees. Three focus groups were held with newcomers who had just begun their training. They
had taken their selection tests at the selection center three weeks before the focus groups were
organized. Because of the short interval it was guaranteed that the trainees would be able to
recall what they had experienced at the selection center. In addition, the short training period that
they already had gone through permitted them to compare the information they had received with
reality. The semi-structured manual that was used to guide the focus group contained questions
referring to advertisement and organizational image, amount and content of the provided
information, information realism, their visit to the career office and the selection center, selection
methods (medical, physical, psychological), recruitment, selection and retention policy, reasons
for withdrawal during the hiring process, and reasons for withdrawal during initial training. It
took about four hours to discuss all these topics.
Employees. One focus group was held with employees familiar with the accession policy of the
military (advertisement, recruitment, and selection specialists). The semi-structured manual that
was used to guide the focus group contained mainly questions referring to current and possible
future recruitment, selection and retention policies. The focus group lasted about four hours.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
388

Results
At the career office
A first finding is that prospects on the whole looked back positively on their career office visit.
They referred to the visit in general positive terms. Spontaneously, the majority could not think
of any negative experience. When they were asked for specific experiences, the greater part was
enthusiastic about the career counselors’ attitude. The career counselor was described as a warm,
empathic person who showed interest in the prospect’s case. However, a number of negative
experiences were recurrently mentioned by prospects, applicants and trainees. Firstly, there was
a general agreement that the amount and content of feedback on the cognitive screening test at the
career office were unsatisfactory. Most respondents could not appreciate the fact that they did
not get any further explanation on their test score and the message that they had passed or failed
the test. Secondly, there was some disagreement about the appropriate amount of information
that career counselors should provide. Some respondents argued that they needed more
information in order to choose a specific military career. Others complained that they were not
able to process all the information they had received at the career office. Thirdly, looking back
on their visits, trainees criticized the unrealistic preview that was dished up at the career office.
The expectations that trainees had formed based upon what they were told and how they were
treated at the career office did not correspond with real military life. Fourthly, the duration of the
enrollment process was mentioned. Respondents did not understand why the enrollment process
had to be so cumbersome and extensive, although most of them were incorporated within one
month after application. Next, several comments were made on the test conditions at the career
office. Especially, the noisiness of the test environment was criticized. In some career offices the
test computers are located in the same room as where the information sessions take place and
where telephone calls are answered. Other frequently mentioned remarks referred to the lack of
supervision during test administration and the occurrence of computer breakdowns. Finally,
some respondents complained about the accessibility of the career offices.
At the selection center
It was striking to find that respondents (applicants, withdrawals, and trainees) spontaneously put
forward several criticisms about their visit to the selection center, and that the employees
involved in the hiring process openly acknowledged some of these criticisms. To begin with,
respondents recurrently labeled the personnel of the selection center as unprofessional and
unmotivated. In a related vein, most respondents found that the personnel had treated them in an
impersonal manner, “as if they were just a number” or “a piece on an assembly line”. Also
criticized was the interviewer’s presumptuous attitude towards the applicants. The trainees
mentioned that this perception of arrogance was strengthened by the impression the uniform
made on them at that time. Yet, few concluded that this lack of professionalism and human
treatment was symptomatic of the whole organization. The majority made a clear distinction
between hiring practices and the “real military”. According to most respondents, recruitment,
selection, and training were not really part of the military, and were “just things that one has to
plough through”. Most trainees were still enthusiastic about their new employer: On the one
hand, they appreciated that their visit to the selection center provided a realistic preview of life at
the training center, but at the same time they believed that “everything will be different once
training is completed”. Again, several remarks were made on the amount and content of
feedback on the selection outcome. Those who passed wanted to know why they were assigned
to a particular occupation; those who failed wanted to have more detailed information on what
went wrong. Several comments were made on the duration of the personality inventory (CPI). It

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
389

is not uncommon that applicants need more than one hour to complete this questionnaire. Due to
time-consuming tests and a tight schedule, applicants sometimes do not have the opportunity to
have lunch. Therefore, it is not surprisingly that there were several complaints about the
experienced time pressure in selection. Finally, some respondents were disappointed in the
physical fitness test. Especially trainees were of the opinion that the ergometrical bicycling test
was too easy as physical selection for the military. This judgment was agreed upon by the group
of employees. The sole positive experience respondents spontaneously could recall was that the
visit to the selection center gave them the possibility to get acquainted with other applicants.
Reasons for applicant withdrawal
As can be concluded from Table 1, the most important reason for self-selecting out was the
perception of available employment alternatives (21%). Most withdrawals had applied at several
organizations and often gave preference to another employer. More specifically, a large
proportion of withdrawals favored the police force. About 13% of all respondents said that the
main reason for withdrawal was that they had lost interest after their visit to the career office.
Based upon the information they received, they had serious doubts on whether they would fit the
organization, and vice versa. More than 10% retreated because they were influenced by the
opinions of significant others (parents, partner, peers). Most parents did not want their child to go
to war. The partner was usually not in favor of the lengthy missions abroad. And often the peer
group, due to its negative perception of the military, pressurized the potential candidate to choose
another career path. Contrary to what we had expected, perceptions of the hiring process were
rarely mentioned as reasons for applicant withdrawal. Yet, more than 9% of all withdrawals
mentioned administrative problems as the main reason for self-selection. These problems had to
do with the loss of documents which are required to participate in the selection (e.g., diploma,
birth certificate), and with misunderstandings about the selection date (e.g., the selection center
forgot to send a confirmation on the date). Most of these withdrawals were willing to reapply as
soon as the problem had been solved. Surprisingly, more than 9% called in physical/medical
problems as their main reason for self-selection. These problems varied from broken toes to the
perception of being too heavy to pass the physical and medical tests. Several withdrawals were
motivated to continue in the selection procedure, but were prevented by transportation problems
(8%) because their car broke down on the way to the selection center, or because they did not
have enough money to pay the train ticket to Brussels. Other reasons referred to persons who had
to work on the day of selection (6%), who were too sick to attend the selection (5%), who had
simply forgot about their appointment (5%), or had family/personal problems at that time (5%).
Table 1 provides an overview of what respondents referred to as their primary reason for
withdrawal.

General discussion
This study tried to examine applicant withdrawal that takes place early in the hiring process. In
many nations the military uses career offices to inform prospective applicants about military job
opportunities and military life. Despite the fact that prospects often put in a great effort to make
the trip to a career office, approximately 10% of all visitors who apply at that time never show up
at the selection center. In this study we used a combination of qualitative research methods to
identify the main reasons for applicant withdrawal.
Although respondents had a lot of criticisms on the hiring process, this was not the main reason
for withdrawing from the selection procedure. Only a small percentage of all withdrawals
referred to the recruitment or selection practices as their primary motive (time delay, test

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
390

anxiety). This finding is in contrast with previous research on this issue (e.g., Rynes et al., 1991).
In addition, trainees made a clear distinction between hiring practices and the “real” organization,
which is the opposite of what was expected from signaling theory (Spence, 1974, 1975). As a
result, it is tempting to conclude that process perceptions are of minor importance to the study of
applicant withdrawal. Then again, the high percentage of respondents mentioning administrative
problems suggests that the application process is too complex and cumbersome. The high
proportion of withdrawals calling in physical and medical problems might suggest that applicants
believe that they have to be top fit in order to pass the physical and medical selection hurdles,
which is contradicted by the group of trainees who were of the opinion that the ergometrical
bicycling test was too easy as a physical selection test for the military. Although we did not find
evidence that perceptions of the hiring process had a direct effect on applicant withdrawal, it is
possible that recruitment activities indirectly influenced applicants’ decisions to withdraw.
Recently, Turban (2001) found that recruitment activities influenced firm attractiveness through
influencing perceptions of organizational attributes. In a related vein, previous research has
found that recruiter behaviors influence attraction to the organization by providing information
about working conditions in the firm (Goltz & Giannantonio, 1995; Turban, Forret, &
Hendrickson, 1998).
Consistent with previous research (Ryan et al., 2000; Schmit & Ryan, 1997), we found that an
important reason for withdrawing was the availability of employment alternatives. It is naïve to
believe that applicants would only consider a military occupation. Although there are exceptions,
the majority of jobseekers typically generate a large number of potential employers for future
consideration. It was found that for many applicants the military is less attractive than other
potential employers. Apparently, for many youngsters the military is an option that is kept in
reserve in case other applications would end in a refusal. More specifically, it turned out that the
military’s strongest competitor in the struggle for manpower is the police force, which is not
unexpected in view of their more attractive salary scale. In other cases, one’s current job was
seen as the better alternative. Some withdrawals reported that they had to work at their current
job at the time of testing and therefore could not attend. Schmit and Ryan (1997) are of the
opinion that “this is probably a combination of individuals who decided that risking their current
job for a slim chance at another job was not worth it, and individuals who decided that making an
extra effort (e.g., taking a vacation day, rearranging one’s schedule) was not worth it” (p. 871).
In both cases, staying with the current job is seen as the better alternative.
Another set of frequently mentioned reasons for withdrawal related to perceptions of a lack of job
and organization fit. Usually, this perception was based upon the information that was provided
at the career office. This finding startled us at first because one would not expect a prospect to
apply for the military when s/he has doubts on whether s/he would fit the organization. But then
again, as long as the application process does not require a significant commitment of time and
energy on the part of the applicant, it is unlikely that s/he will retreat. At this stage of the process,
applicants are still generating a pool of opportunities from which one (eventually) will be chosen
(Barber, 1998). In most cases, this kind of withdrawal reactions is beneficial to the organization
because it spares the person a likely disillusion, and it saves the organization a considerable
amount of money and time. However, organizations should concern themselves if perceptions of
a lack of fit are based on erroneous information (Schmit & Ryan, 1997). This is particularly true
for cases in which the reason for withdrawing had to do with hearsay stories about the military,
told by relatives or friends after the person had visited the career office.
This brings us to another set of frequently mentioned reasons for withdrawal: The influence of
significant others on the applicant’s decision to withdraw. For 10% of the respondents, this was

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
391

the primary motive for withdrawal. Furthermore, for many the opinion of significant others on
the military played a role in their decision, but was not the main reason for withdrawal. For
example, some individuals preferred their current job to a military occupation because their
parents had convinced them that this was the right thing to do. It should be noted that at the time
of the data collection the Iraqi war had just begun. For most applicants this was not an issue, but
withdrawals often mentioned that their parents were strongly opposed the possibility that their
child would go to war. We believe that military recruitment programs could be strengthened by
involving parents, partners and friends in the application process; for instance, by organizing an
information evening or site visit for relatives.
Finally, several interviewees told us that they were not able to attend the selection session,
because they had not sufficient resources to pay their train or bus ticket. Contrary to some private
firms, the Belgian military does not refund travel expenses because it is alleged to be too
expensive for the treasury.
Limitations, future research opportunities and contributions
Admittedly, there are several limitations to our study. First, the interview data represent
retrospective self-reports. Thus, the primary reason for withdrawal may be distorted in some
cases (Schmit & Ryan, 1997). For example, some individuals may have reported that they
withdrew because they had a physical problem or because their car broke down, when in reality
less socially desirable motives were involved.
Second, withdrawal behavior was studied at an early stage in the total selection process. It is not
unlikely that reasons for withdrawal vary across time in the selection process even if the focus is
on withdrawal prior to offer extension (Schmit & Ryan, 1997). Taylor and Bergmann (1987)
found that recruitment activities affected applicant reactions only in the initial stage of the hiring
process; after that point, only job attributes were found to significantly affect applicant reactions.
According to Schmit and Ryan (1997) “future studies should include an explicit role for timing in
the explanation of the withdrawal decision. For instance, an interesting group of ‘withdrawals’
for the military are the individuals who paid a visit to the career office, but decided not to apply.
In view of the high proportion of withdrawals who gave preference to another employer, we
argue that future research on military recruitment should also consider why other organizations
are favored and what can be done to make the military more attractive in relation to its
competitors.
A third limitation of this study is that we only analyzed the primary motive for withdrawal,
although in reality it is often a cluster of reasons that has led to the decision to withdraw. Future
research should analyze the combination of reasons instead of isolated motives.
The limitations of this study are offset by several strengths. First, we dissented from the
traditional pathway by taking into account the perspective of different stakeholders. We did not
only interview the persons who withdrew from the hiring process, but also prospects, applicants
who completed the process, trainees and even employees familiar with the accession policy of the
military. This approach was very helpful in order to conduct the telephonic interviews and to
interpret the results.
Secondly, contrary to previous research in this area, we examined applicant withdrawal at a very
early stage of the hiring process. Until now, relatively little is known about applicants’ decisions
to withdraw from an organization’s selection process prior to the point of a job offer (Schmit &
Ryan, 1997). This is unfortunate, because these decisions are surely important to organizations
(Barber, 1998). Especially the inclusion of non-applicants will be informative for both research
and practice.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
392

Conclusions
Inconsistent with previous research, we found that withdrawals were generally unaffected by
early recruitment practices. The primary motive for withdrawal was that the availability of
preferred employment alternatives. Apparently, for many youngsters the military is an option
that is kept in reserve in case other applications would end in a refusal. Several applicants
changed their mind about becoming a military because of perceptions of a lack of fit between
person and organization. This is beneficial to the organization in case these perceptions were
valid. In this study, however, some applicants withdrew because of controversial stories about
the military told by parents, friends or partner. The influence of significant others is probably
underestimated. Most interviewees acknowledged that this played in their decision to withdraw,
but usually other reasons were said to be more important. Future research should take into
account that seldom one isolated reason causes an applicant to withdraw. Finally, we believe that
despite the fact that few applicants mentioned hiring practices as a direct reason of withdrawal,
process perceptions might have influenced applicant reactions indirectly through influencing
perceptions of organizational attributes.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
393

References
Anderson, N., Born, M., & Cunningham-Snell, N. (2001). Recruitment and Selection: Applicant
Perspectives and Outcomes. In N. Anderson, D. Ones, H.K. Sinangil, & C. Viswesvaran
(Eds.), Handbook of Industrial, Work, and Organizational Psychology: Volume 1
Personnel Psychology (pp. 200-218). London: Sage.
Anderson, N., & Ostroff, C. (1997). Selection as socialization. In N. Anderson & P. Herriot
(Eds.), International handbook of selection and assessment. London: Wiley.
Arvey, R., Gordon, M., Massengill, D., & Mussio, S. (1975). Differential dropout rates of
minority and majority job candidates due to time lags between selection procedures.
Personnel Psychology, 38, 175-180.
Barber, A.E. (1998). Recruiting employees: Individual and organization perspectives. Thousand
Oaks, CA: Sage.
Barber, A.E., & Roehling, M.V. (1993). Job postings and the decision to interview: A verbal
protocol analysis. Journal of Applied Psychology, 79, 845-856.
Blau, G.J. (1985). The measurement and prediction of career commitment. Journal of
Occupational Psychology, 58, 277-288.
Breaugh, J.A. (1992). Recruitment: Science and practice. Boston: PWS-Kent Publishing.
Brett, J.M. (1982). Job transfer and well-being. Journal of Applied Psychology, 67, 450-463.
Brett, J.M., & Reilly, A.H. (1988). On the road: Predicting the job transfer decisions. Journal of
Applied Psychology,73, 614-620.
Bretz, R.D., Ash, R.A., & Dreher, G.F. (1989). Do people make the place? An examination of the
attraction-selection-attrition hypothesis. Personnel Psychology, 42, 561-581.
Bretz, R.D., & Judge, T.A. (1994). Person-organization fit and the theory of work adjustment:
Implications for satisfaction, tenure, and career success. Journal of Vocational Behavior,
44, 32-54.
Burke, R.J., & Deszca, E. (1982). Preferred organizational climates of Type A individuals.
Journal of Vocational Behavior, 21, 50-59.
Cable, D.M., & Judge, T.A. (1996). Person-organization fit, job choice decisions, and
organizational entry. Organizational Behavior and Human Decision Processes, 67(3), 294-
311.
Carson, K.D., & Bedeian, A.G. (1994). Career commitment: Construction of a measure and
examination of its psychometric properties. Journal of Vocational Behavior, 44, 237-262.
Chatman, J.A. (1989). Improving interactional organizational research: A model of person-
organization fit. Academy of Management Review, 14, 333-349.
Chatman, J.A. (1991). Matching people and organizations: Selection and socialization in public
accounting firms. Administrative Science Quarterly, 36, 459-484.
Crant, J.M., & Bateman, T.S. (1990). An experimental test of the impact of drug-testing programs
on potential job applicants’ attitudes and intentions. Journal of Applied Psychology, 75,
127-131.
Derous, E., & De Witte, K. (2001). Sociale procesfactoren, testmotivatie en testprestatie. Een
procesperspectief op selectie geëxploreerd via een experimentele benadering. Gedrag &
Organisatie, 14(3), 152-170.
Fombrun, C., & Shanley, M. (1990). What’s in a name? Reputation building and corporate
strategy. Academy of Management Journal, 33, 233-258.
Gatewood, R. D., Gowan, M. A., & Lautenschlager, G. J. (1993). Corporate image, recruitment
image, and initial job choice decisions. Academy of Management Journal, 36, 414-427.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
394

Goltz, S.M., & Giannantonio, C.M. (1995). Recruiter friendliness and attraction to the job: The
mediating role of inferences about the organization. Journal of Vocational Behavior, 46,
109-118.
Gould, S., & Penley, L. (1985). A study of the correlates of the willingness to relocate. Academy
of Management Journal, 28, 472-478.
Granovetter, M.S. (1974). Getting a job: A study of contacts and careers. Cambridge, MA:
Harvard University Press.
Harris, M.M., & Fink, L.S. (1987). A field study of applicant reactions to employment
opportunities: Does the recruiter make a difference? Personnel Psychology, 40, 765-784.
Iles, P.A., & Robertson, I.T. (1997). The impact of personnel selection procedures on candidates.
In N. Anderson & P. Herriot (Eds.), International handbook of selection and assessment.
Chichester: Wiley.
Judge, T.A., & Bretz, R.D., Jr. (1992). Effects of work values on job choice decisions. Journal of
Applied Psychology, 77, 261-271.
Judge, T.A., & Cable, D.M. (1997). Applicant personality, organizational culture, and
organization attraction. Personnel Psychology, 50, 359-394.
Kilduff, M. (1990). The interpersonal structure of decision making: A social comparison
approach to organizational choice. Organizational Behavior and Human Decision
Processes, 47, 270-288.
Kristof, A.L. (1996). Person-organization fit: An integrative review of its conceptualizations,
measurement, and implications. Personnel Psychology, 49, 1-50.
Legree, P.J., Gade, P.A., Martin, D.E., Fischl, M.A., Wilson, M.J., Nieva, V.F., McCloy, R., &
Laurence, J. (2000). Military enlistment and family dynamics: Youth and parental
perspectives. Military Psychology, 12(1), 31-49.
Lent, R. W., Brown, S. D., & Hackett, G. (1994). Toward a unified social cognitive theory of
career and academic interest, choice, and performance. Journal of Vocational Behavior,
45, 79-122.
Liden, R.C., & Parsons, C.K. (1986). A field study of job applicant interview perceptions,
alternative opportunities, and demographic characteristics. Personnel Psychology, 39,
109-122.
Macan, T.H., Avedon, M.J., Paese, M., & Smith, D.E. (1994). The effects of applicants’ reactions
to cognitive ability tests and an assessment center. Personnel Psychology, 47, 715-738.
Mael, F.A., & Ashforth, B.E. (1995). Loyal from day one: Biodata, organizational identification,
and turnover among newcomers. Personnel Psychology, 48, 309-333.
March, J., & Simon, H. (1958). Organizations. New York: Wiley & Sons.
Maurer, S.D., Howe, V., & Lee, T.W. (1992). Organizational recruiting as marketing
management: An interdisciplinary study of engeneering graduates. Personnel Psychology,
45, 807-833.
Meglino, B.M., DeNisi, A.S., Youngblood, S.A., & Williams, K.J. (1988). Effects of realistic job
previews: A comparison using an enhancement and a reduction preview. Journal of
Applied Psychology, 73, 259-266.
Mobley, W.H., Griffeth, H.H., Hand, H.H., & Meglino, B.M. (1979). Review and conceptual
analysis of the employee turnover process. Psychological Bulletin, 86(3), 493-522.
Murphey, K.A. (1986). When your top choice turns you down: Effect of rejected job offers on the
utility of selection tests. Psychological Bulletin, 99, 128-133.
Noe, R.A., & Barber, A.E. (1993). Willingness to accept mobility opportunities: Destination
makes a difference. Journal of Organizational Behavior, 14, 159-175.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
395

Noe, R.A., Steffy, B.D., & Barber, A.E. (1988). An investigation of the factors influencing
employee’s willingness to accept mobility opportunities. Personnel Psychology, 41, 559-
580.
O’Reilly, C.A., Chatman, J.A., & Caldwell, D.F. (1980). Job choice: The impact of intrinsic and
extrinsic factors on subsequent satisfaction and commitment. Journal of Applied
Psychology, 65, 559-565.
Osborn, D.P. (1990). A reexamination of the organizational choice process. Journal of Vocational
Behavior, 36, 45-60.
Pervin, L.A. (1989). Persons, situations, interactions: The history of a controversy and a
discussion of theoretical models. Academy of Management Journal, 14, 350-360.
Posner, B.Z. (1992). Person-organization values congruence: No support for individual
differences as a moderating influence. Human Relations, 45, 351-361.
Powell, G.N. (1984). Effects of job attributes and recruiting practices on applicant decisions: A
comparison. Personnel Psychology, 44, 67-83.
Powell, G.N. (1991). Applicant reactions to the initial employment interview: Exploring
theoretical and methodological issues. Personnel Psychology, 44, 67-83.
Premack, S.L., & Wanous, J.P. (1985). A meta-analysis of realistic job preview experiments.
Journal of Applied Psychology, 70, 706-719.
Reynolds, L.G. (1951). The Structure of Labor Markets. New York: Harper.
Ryan, A.M., & McFarland, L.A. (1997, April). Organizational influences on applicant withdrawal
from selection processes. Paper presented at the Twelfth Annual Conference of the
Society for Industrial and Organizational Psychology, St. Louis, MO.
Ryan, A.M., Sacco, J.M., McFarland, L.A., & Kriska, S.D. (2000). Applicant self-selection:
correlates of withdrawal from a multiple hurdle process. Journal of Applied Psychology,
85(2), 163-179.
Rynes, S.L. (1991). Recruitment, job choice, and post-hire consequences: A call for a new
research direction. In M.D. Dunnette & L.M. Hough (Eds.), Handbook of industrial and
organizational psychology (2nd ed., Vol. 2, pp. 399-444).
Rynes, S.L. (1993). Who’s selecting Whom? In N. Schmitt & W.C. Borman (Eds.), Personnel
Selection in Organizations. San Francisco, CA: Jossey-Bass.
Rynes, S.L., Bretz, R., & Gerhart, B. (1991). The importance of recruitment in job choice: A
different way of looking. Personnel Psychology, 44, 487-521.
Rynes, S.L., & Lawler, J. (1983). A policy-capturing investigation of the role of expectancies in
decisions to pursue job alternative. Journal of Applied Psychology,68,620-631.
Schmit, M.J., & Ryan, A.M. (1997). Applicant withdrawal: The role of test-taking attitudes and
racial differences. Personnel Psychology, 50, 855-876.
Schmitt, N., Gilliland, S.W., Landis, R.S., & Devine, D. (1993). Computer-based testing applied
to selection of secretarial applicants. Personnel Psychology,46, 149-165.
Schreurs, B., Derous, E., De Witte, K., Proost, K., Andriessen, M., & Glabeke, K. (2003).
Attracting Potential Applicants to the Military: The Effects of Initial Face-to-Face
Contacts. Manuscript submitted for publication.
Schuler, H., Farr, J., & Smith, M. (1993). The individual and organizational sides of personnel
selection and assessment. In H. Schuler, C.J.L. Farr & M. Smith (Eds.), Personnel
selection and assessment: Individual and organizational perspectives. New Jersey:
Lawrence Erlbaum.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
396

Schwab, D.P., Rynes, S.L., & Aldag, R.A. (1987). Theories and research on job search and
choice. In K. Rowland and G. Ferris (Eds.), Research in Personnel and Human Resources
Management (Vol. 5, pp.129-166).
Slaughter, J. E., Zickar, M., Highhouse, S., Mohr, D. C., Steinbrenner, D., & O'Connor, J. (2001,
April). Personality trait inferences about organizations: Development of a measure and
tests of the congruence hypothesis. Paper presented at the Annual Conference of the
Society for Industrial and Organizational Psychology, San Diego, CA.
Smither, J.W., Reilly, R.R., Millsap, R.E., Pearlman, K., & Stoffey, R.W. (1993). Applicant
reactions to selection procedures. Personnel Psychology, 46, 49-76.
Spence, A.M. (1974). Market Signalling. Cambridge, MA: Harvard University Press.
Spence, A.M. (1973). Job market signalling. Quarterly Journal of Economics, 87, 355-374.
Steel, R. (1996). Labor market dimensions as predictors of the reenlistment decisions of military
personnel. Journal of Applied Psychology, 69, 846-854.
Steel, R., & Griffeth, R. (1989). The elusive relationship between perceived employment
opportunity and turnover behavior: A methodological or conceptual artifact? Journal of
Applied Psychology, 69, 846-854.
Steiner, D.D., & Gilliland, S.W. (1996). Fairness reactions to personnel selection techniques in
France and the United States. Journal of Applied Psychology, 81, 134-141.
Suszko, M.J., & Breaugh, J.A., (1986). The effects of realistic job previews on applicant self-
selection and employee turnover, satisfaction, and coping ability. Journal of Management,
12, 513-523.
Taylor, M.S., & Bergmann, T.J. (1987). Organizational recruitment activities and applicants’
reactions at different stages of the recruitment process. Personnel Psychology, 40, 261-
285.
Tom, V. R. (1971). The role of personality and organizational images in the recruiting process.
Organizational Behavior and Human Decision Processes, 6, 573-592.
Turban, D.B. (2001). Organizational attractiveness as an employer on college campuses: An
examination of the applicant population. Journal of Vocational Behavior, 58, 293-312.
Turban, D.B., Campion, J.E., & Eyring, A.R. (1995). Factors related to job acceptance decisions
of college recruits. Journal of Vocational Behavior, 47, 193-213.
Turban, D.B., & Dougherty, T.W. (1992). Influences of campus recruiting on applicant attraction
to firms. Academy of Management Journal, 35, 739-765.
Turban, D.B., Eyring, A.R., & Campion, J.E. (1993). Job attributes: Preferences compared with
reasons given for accepting and rejecting job offers. Journal of Occupational and
Organizational Psychology, 66, 71-81.
Turban, D.B., Forret, M.L., & Hendrickson, C.L. (1998). Applicant attraction to firms: Influences
of organization reputation, job and organizational attributes, and recruiter behaviors.
Journal of Vocational Behavior, 52, 24-44.
Turban, D. B., & Greening, D. W. (1997). Corporate social performance and organizational
attractiveness to prospective employees. Academy of Management Journal, 40, 658-672.
Turban, D. B., & Keon, T. L. (1993). Organizational attractiveness: An interactionist perspective.
Journal of Applied Psychology, 78, 184-193.
Vancouver, J.B., Millsap, R.E., & Peters, P.A. (1994). Multilevel analysis of organizational goal
congruence. Journal of Applied Psychology, 79, 666-679.
Vancouver, J.B., & Schmitt, N.W. (1991). An exploratory examination of person-organization fit:
Organizational goal congruence. Personnel Psychology, 44, 333-352.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
397

Wanous, J.P. (1992). Organizational entry: Recruitment, selection, and socialization of


newcomers (2nd ed.). Reading, MA: Addison-Wesley.
Wanous, J.P., & Colella, A. (1989). Organizational entry research: Current status and future
directions. In K. Rowland & G. Ferris (Eds.), Research In Personnel and Human
Resources Management, pp. 59-120, Greenwich, CT: JAI Press.
Wiesner, W.H., Saks, A.M., & Summers, R.J. (1991). Job alternatives and job choice. Journal of
Vocational Behavior, 38, 198-207.
Witt, L.A., & Nye, L.G. (1992, April). Goal congruence and job attitudes revisited. Paper
presented at the Seventh Annual Conference of the Society for Industrial and
Organizational Psychology, Montreal, Canada.

Table 1
Reasons for Withdrawal
1. Available employment alternatives 21,18%
2. Lost interest/doubts 12,94%
3. Significant others 10,59%
4. Administrative problems 9,41%
5. Medical/physical problems 9,41%
6. Transportation problems 8,24%
7. Had to work 5,88%
8. Too sick to attend 4,71%
9. Forgot 4,71%
10. Personal/family problems 4,71%
11. Other things to do 2,35%
12. Further education 2,35%
13. Test anxiety 1,18%
14. Time delay 1,18%
15. Other military career 1,18%
Note. N = 200.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
398

U.S. ARMY RECRUITER SELECTION RESEARCH: AN UPDATE32


Walter C. Borman33
Personnel Decisions Research Institutes, Inc. and
University of South Florida
100 South Ashley Drive, Suite 375
Tampa, FL 33602
wally.borman@pdri.com

Leonard A. White
U.S. Army Research Institute
5001 Eisenhower Avenue
Alexandria, VA 22333
WhiteL@ARI.army.mil
Stephen Bowles
Command Psychologist, U.S. Army Recruiting Command
10,000 Hampton Pkwy, Room 1119, Center One
Fort Jackson, SC 29207
bowless@jackson.army.mil

Kristen E. Horgen, U. Christean Kubisiak, and Lisa M. Penney


Personnel Decisions Research Institutes, Inc.
100 South Ashley Drive, Suite 375
Tampa, FL 33602
kristen.horgen@pdri.com, chris.kubisiak@pdri.com, lisa.penney@pdri.com

[INTRODUCTION

[The objective of this research program is to develop and validate a new screening test
battery for selecting U.S. Army recruiters. The approach has been to first conduct a concurrent
validation study by administering a trial test battery to production recruiters currently on the job
and also obtain performance measures on these same recruiters. The concurrent validation
research has been completed and this paper describes results of that research.

32
Paper presented at the Annual Conference of the International Military Testing Association, Pensacola, FL. The
views expressed in this paper are those of the authors and do not necessarily reflect those of the Army Research
Institute, or any other department of the U.S. Government.
33
Also contributing technical support to the research program are Valentina Bruk, Patrick Connell, Elizabeth Lentz,
and Vicky Pace, Personnel Decisions Research Institutes, Inc., and Mark C. Young, U.S. Army Research Institute.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
399

We also have under way a predictive validation study to evaluate the validity of the test
battery for predicting the subsequent performance of recruiters in a testing setting that is more
similar to an actual operational environment for the test administration. Non-commissioned
officers (NCOs) entering the Recruiting and Retention School (RRS) are being administered the
test battery, and we have now followed up on a sample of 1,466 of these recruiters. Attrition data
from the RRS and production data (i.e., number of recruits brought into the Army per month)
were available for these recruiters, and we also present initial predictive validation results using
these indicators as criteria.

Long-term goals for the project are to establish a standard screening process for NCO
candidates for a recruiting assignment prior to their being accepted into the RRS. NCOs scoring
well on the battery might be encouraged to volunteer for recruiting. An even broader goal is to
develop a classification test battery to target placement into other possible second-stage jobs
(e.g., drill instructor). In one scenario, this classification battery would be administered routinely
to NCOs at the beginning of their second tour, and predicted performance scores would be
generated for each target job. Then, NCOs could be counseled about which second-stage job(s)
suit them best.

THE CONCURRENT VALIDATION STUDY

We first conducted a job analysis of the Army recruiter military occupational specialty
(MOS). This analysis had two purposes: to identify the recruiter performance requirements and
thus to suggest performance measures that might be used as criteria in the validation study; and
to identify candidate predictor tests for the validation research.

Criteria

. The job analysis suggested that production rates (as mentioned, number of prospects
brought into the Army per unit time), and peer and supervisory ratings of job performance on the
main dimensions of recruiter performance would provide relevant criteria. We also decided to
develop a situational judgment test to measure problem-solving, judgment, and decision-making
skills important in recruiting. More details on the criterion measures will be provided in a
moment.

Predictors

Predictor measures for the test battery included: (1) The Army Research Institute’s
Background Information Questionnaire (BIQ), with eight scales including “natural” leader,
social perceptiveness, and interpersonal skills; (2) The Army Research Institute’s Assessment of
Individual Motivation (AIM), with six scales including work orientation, leadership, and
adjustment; (3) The Sales Achievement Profile (SAP), with 21 scales including validity scales,
sales success, motivation and achievement, work strengths, interpersonal strengths, and inner
resources; (4) The Emotional Quotient Inventory (EQI), with 15 scales including intrapersonal,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
400

adaptability, general mood, interpersonal, and stress management components; and (5) The
NEO, with five scales measuring the Big 5 personality factors.

Sample

. A total of 744 Army recruiters from 10 recruiting battalions comprised our concurrent
sample.

Results

First, regarding the criteria, we obtained production data for a 12-month period for many
of the recruiters in the sample. Not all members of the sample had 12 months data. In fact, on
average, they had about eight months. We computed reliability coefficients for two months
through 12 months data and found that four months provided reasonable levels of reliability
(intraclass correlation = .59). Fewer months of production data reduced reliability substantially.
Thus, only recruiters who had at least four months production data were included in the
validation analyses.

It is well documented that some territories are inherently easier or more difficult to recruit
in than others. Thus, we experimented with correcting production data using the mean values for
territories of various sizes, including at the brigade (N = 5), battalion (N = 41), and company
(N = 243) levels. Corrections using brigade mean production levels proved to yield the most
reliable production data, and this was therefore the correction employed.

Peers and supervisor ratings of job performance were gathered on eight behavior-based
rating scales. A total of 1,542 raters provided ratings on 619 recruiters, an average of 2.41 sets of
ratings per ratee. Results of the performance ratings showed reasonable distributions of ratings
(i.e., means around 6.5 on a 1-10 scale). Also, the interrater agreement results are quite good,
with peers and supervisors showing comparatively high agreement in their ratings of recruiters
(rs = .55 to .67). Finally, to summarize the eight dimensions into a simpler system, three factors
were identified: Selling, Human Relations, and Organizing Skills.

The situational judgment test (SJT) had 25 items in multiple-choice format that presented
difficult but realistic recruiting-related situations and four to five response options that
represented different possible ways to handle each situation. Effectiveness ratings for each
response were provided by subject matter experts (SMEs), and these effectiveness ratings formed
the basis of the scoring key. The best scores on the SJT were obtained when the recruiters’
responses most closely corresponded to the responses the SMEs regarded as most effective.
Relationships between the SJT and the other criteria (e.g., ratings, sales volume) were somewhat
lower than expected. As a result of these unexpected findings, we did not include the SJT as a
component of the criterion in our validation analyses. Future work on the SJT will examine an
empirical keying approach as an alternative to SME judgments of effectiveness for scoring this
measure.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
401

For the validation analyses, a final sales performance composite was derived. Recruiting
Command policy makers provided the following weights on the criterion measures:
(1) production data (corrected) = 50%; (2) Selling Skills ratings = 30%; (3) Human Relations and
Organizing Skills ratings = 10% apiece. Weighted standard scores for each component of the
composite were summed and became the criterion against which the validity of the predictors
were determined.

Table 1 presents the significant validities for the refined AIM and BIQ scales. In the
scale refinement process, items with higher correlations against the criterion were given a higher
weight in the scale score. Various cross-validation analyses suggested that the validities in Table
1 are reasonable estimates of these two instruments’ validity, without capitalizing on chance.
The SAP, EQI, and NEO results were not quite as positive as were the AIM and BIQ results.

Table 1
Correlations of AIM and BIQ Scales With the Sales Performance Criterion
Performance Composite
Scale N = 446-453
AIM Work Orientation .28**
AIM Leadership .26**
AIM Agreeableness .10*
BIQ Hostility to Authority -.14**
BIQ Social Perceptiveness .18**
BIQ “Natural” Leader .32**
BIQ Self-Esteem .25**
BIQ Interpersonal Skill .27**
**p < .01 *p<.05

Table 2 presents the total estimated cross-validity levels are given for each test alone and
in combination with the other test. As can be seen, comparable validities (all p < .01) are
observed with AIM and BIQ alone. Both the AIM and BIQ were considered for further
evaluation in the predictive validation study.

Table 2
Summary Validation Results
Test Correlation
BIQ alone .33**
AIM alone .31**
BIQ + AIM .36**

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
402

THE PREDICTIVE VALIDITY RESEARCH

The recruiter screening battery has considerable potential for identifying NCOs likely to
be successful recruiters. However, we believe it is highly important to evaluate, as well, the
validity of the battery over time in a predictive validation design. As mentioned, this research is
underway at the RRS, where incoming students at the school are being administered the test. In
fact, 1700+ recruiters who completed the test battery, recently called the NCO Leadership Skills
Inventory or NLSI, have now progressed through the RRS (although 10.9% dropped out during
training) and 1,466 have at least five months production data available for the predictive
validation analyses. Below we describe that research.

Criteria

We used two criteria in the predictive work. Attrition from the RRS was one criterion,
and production rates corrected for territorial differences, as in the concurrent study, was the other
criterion. We included recruiters in the study only if they had five or more months of production
data. The reliability of the corrected production index was .64.

Predictors

As mentioned previously, the test battery now included the BIQ and the AIM, and is
referred to as the NLSI.

Sample

More than 1700 NCOs entering the RRS were administered the NLSI. Approximately
160 of these NCOs were dismissed from or otherwise failed to complete training. A total of 1466
recruiters graduated from the RRS and had five or more months of production data.

Results

The predictive validity of the NLSI was also promising, although at this point the criteria
are somewhat limited. First, the attrition results showed that when NLSI scores are divided into
quartiles, attrition was respectively, from highest to lowest scorers, 6.5%, 8%, 10.5%, and
18.8%. Even more dramatic, the bottom 5% of NLSI scorers has an attrition rate of 36%,
whereas the rest of the sample attrited at a 9% rate.

Second, the production rate results indicated that recruiters who score in the top 25% on
the NLSI brought in 1.10 recruits on average. For the second highest quartile the number was
1.06, 1.03 for the third highest quartile, and .91 for the lowest quartile. At the extreme bottom of
the NLSI distribution, the lowest 5% bring in .75 recruits, whereas the rest of the sample has
1.05 as an average. For comparison purposes, the BIQ-AIM composite cross-validity against
production in the concurrent study was .22. The corresponding validity coefficient in the
predictive study was .15.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
403

NLSI IMPLEMENTATION TO A WEB-BASED PLATFORM

In order to implement the NLSI most efficiently and effectively, we were asked to
develop a web-based version. We coordinated with ePredix, an organization specializing in web-
based testing, and the U.S. Army Research Institute to generate HTML-based AIM and BIQ
forms that were as similar as possible to the original paper version. The project staff generated
instructions and item formats that could be used in conjunction with ePredix’s established
computer-based testing engine. Additionally, this system allowed for immediate scoring of the
tests and reporting of results in real-time. Furthermore, this enables candidates for recruiting
duty, who are deployed throughout the world, to take the test at any military base equipped with
a Digital Training Facility (DTF). In order to maintain the security of the battery, the test can
only be accessed at appointed times and in a proctored setting at the DTF. The test was deployed
on a trial basis at a limited number of DTFs in early 2003 and will gradually be expanded to all
DTF locations. Data gathered so far indicate that the recruiter candidates tested using the on-line
system are obtaining scores comparable to those obtained by recruiter candidates who took the
paper version.

CONCLUSION

The NLSI appears to be a reasonably valid predictor of Army recruiter job performance.
Concurrent and predictive validation results demonstrate substantial relationships with attrition
from recruiter training and several indicators of recruiter performance. A web-based,
computerized version of the NLSI has now been developed and is being administered to NCOs
in Army DTFs. We anticipate that this instrument will be used operationally to encourage NCOs
likely to succeed in a recruiting environment to apply for recruiting duty. We are also
investigating potential uses of the NLSI for assignment to other NCO specialties (e.g., Drill
Sergeant).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
404

Modelling Communication in Negotiation in PSO Context


Prof Dr MSc Jacques Mylle34
Human Factors & Military Operations Research
Royal Military Academy
B1000 Brussels – Belgium
jacques.mylle@rma.ac.be

Scope
Although people are not conscious of it, negotiation is a very common behaviour. A major
problem that arises often is the lack of a “shared mental model” of the negotiation situation
and which constitutes the core of the negotiation. So, people often leave a lot of information
implicit which in turn leads to misunderstandings.
In other words clear communication depends on the quantity, the quality and the relevance of
the information procured.

Soldiers deployed in a peace support context do not constitute an exception to the above
mentioned observations, on the contrary. Different cultural background, not mastering each
others language, interpreters who do not translate reliably, problems with role perception, etc
are all parameters which make clear communication very difficult and hence negotiation even
more.

Research aim
The aim of our project is to use mathematical and computer tools to unravel and to represent
the logical and informational structure of a dialogue in a given setting. The ultimate goal is to
create a “machine” that can “dialogue” with a living person about a particular object.
On the one hand it will be used to enhance people communicative abilities in a military setting
and on the other hand to train people in negotiation structured way.

It goes without saying that such a project requires the contribution of several fields: among
others psychology, linguistics and computer sciences. In this paper we will look specifically
at the computational linguistic aspects.

Approach
Formally spoken, in linguistics three separate aspects have to be considered, which have to be
implemented in specific modules.
1. The syntactic aspect.
The purpose of this module is to analyze a sentence from a grammatical point of view; i.e.
looking up for the subject, the verb, etc. The output of this first module is necessary as a
structured input for the second module.
2. The semantic aspect.
This module builds a logical representation of the meaning of the sentence starting from
the syntactic structure of it. In most cases these logical representations are underspecified
with respect to certain relationships within and between sentences. Both of these modules

34
In strong collaboration with Nicholas Yates, researcher and specialist in computer linguistics, and Prof Dr
André Helbo, director of the Language Training Centre at the Royal Military Academy.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
405

work at the sentence level. This ambiguity is clearly inherent to the human language
when used out of a context.
3. The pragmatic aspect.
The module related to this aspect works on the underspecified relationships created by the
semantic module. It builds a general representation of a series of sentences - which may be
a continuous text or a dialogue. This module has to link a given sentence to the former
discourse by confronting them with the given context/setting which can be reduced to a set
of “givens” and relationships. This set of informations must allow for resolving the
aforementioned ambiguities as a consequence of the initial linguistic underspecifications
in the semantic representation.

Each of these three modules requires a specific formal language. We have chosen the
following ones for our project.
Head-driven Phrase Structure Grammar (Pollard & Sag, 1994) - in short HPSG - for the
syntactic module.
Minimal Recursion Semantics (Copestake et al. 1999) – in short MRS – for the semantic
module.
Segmented Discourse Representation Theory (Asher & Lascarides, 2003) - in short SDRT –
for the pragmatic module.

Both HPSG and MRS are already implemented in a computional language in a system called
Linguistic Knowledge Builder (Copestake et al. 2000)- in short LKB. Unfortunately, this is
not the case for SDRT . So, this tough task has to be done by our research group.

An example of communication / negotiation in a PSO setting

The text below is part of the transcript of a dialogue recorded in Leposavic (Kosovo) in Januar
2003 during a working visit in the context of our project. The two parties involved are the
Damage Control Officer (DCO) of the Belgian Battle Group in Kosovo – whose
mothertongue is French - and a Kosovar civilian, assisted by a Kosovar interpreter who
masters more or less English.

Β01 Officer : Hello Dragan.


Β02 Interpr : Sacha, Sacha ! How are you ? (Gives a document to the DCO)
Β03 Officer : I am OK.
Β04 Interpr : This is the person who had an accident with a Belgian bus six months ago.
Β05 Officer : Yes… ?
Β06 Interpr : But he did not get his money yet.
B07 Officer : Mm. Mm. (Reads the document) Vu-ko-ma…
B08 Interpr : Vukomanovic.
B09 Officer : I don’t have a file on this person.
B10 Interpr : Hun?He told me that they told him to wait four months. Within this four months
he would be refunded. But he is waiting since more than six months and a halve.

The syntactic module : HPSG and LKB


Figure 1 shows how LKB analyses a simple sentence like B09 “I don’t have a file on this
person.”

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
406

The left side of the window shows the graphical representation of the syntactic analysis. One
can easily locate the subject, the verb, the negation and the object.
The right side of the window gives the computational representation of the syntactic elements
of this particular sentence in the HPSG language. Such a structure is, mathematically spoken,
a “Typed Feature Structure”. It contains all elements that are relevant and necessary for the
following steps in the analysis.

Figure 1. Example of a sentence analysis by HPSG in LKB.

Représentation informatique des


Représentation graphique informations syntaxiques dans le langage
Computational representation
deGraphical
la structure syntaxique
representation HPSG (du point de vue mathématique, on
de la phrase. appelle ceci une structure de traits typés). Ce
sont ces informations qui sont réellement

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
407

The semantic module: MRS and LKB

The information contained in the HPSG representation allow for the computation of the
underspecified semantic representation of the same sentence (Figure 2). It can be visualized
in its computational form as shown in the left part of Figure 2 and/or in its mathematical form
as shown in the right part of Figure 2.

Figure 2. Example of output produced by the semantic module.

Mathematical version

Computational version

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
408

The pragmatic module : SDRT

The implementation of this module will constitute the biggest load of work.
As already said, SDRT is a theory that predicts how successive sentences of a text/dialogue
will be linked to each other. To be able to do so, a number of rules and axioms must be
specified beforehand about the “state of the world” for the problem at hand. Thus, this has to
do with making explicit in the program what is left implicit in the natural language because
the subjects have a mental picture about the state of the world. By means of this
complementary set of information, it is possible to determine what type of relationships links
the sentence under consideration to one or more of the former ones.

For example, take the sentences Β04, B05, and B06.


Β04 Interpr. : This is the person who had an accident with a Belgian bus six months ago.
Β05 Officer : Yes… ?
Β06 Interpr. : But he did not get his money yet.

After Β04, SDRT builds a “segmented discourse representation structure” (SDRS). The
Figure 3 below shows those SDRS in their standard graphical form. Each of the parts after a Π
constitutes an entity of information for “monotonic recursive structures” (MRS) which parts
that can be univocally determined in the sense that there are at least no contradictions with
what is already known. This is a notational gloss for a more precise, but less readable
representation.

Figure 3a. Example of rules about the “state of the world” for SDRT

How does SDRT works with these entities ? First, an expression like “this is the person
who..;” implies that the speaking party (here the interpreter) supposes that the listening party
(here the officer) knows who the “this person” is (the Kosovar who is complaining). This
assumption is a general axiom meaning. Moreover, in the given context, we can assume as
“normal” that, if the person is known by the Damage Control, there is a file about that person.
This is a contextual common sense rule.
If this type of information is available to SDRT, it can compute this complementary
information (π4c) and link it through a causal relation to the sentence π4; relationship which is
in this case “consequence by default” (Def-Cons).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
409

Next the program has to deal with Β5. This reply is in fact a question but which can only be
understood as a relevant question if this “sentence” is interpreted as a query for further
specification. This is represented by the relation Request for Explanation or Explanation* in
SDRT terminology (Fig 3b). In this concrete case, the officer does still not know the reason –
formally spoken : does not have enough information to “compute” the purpose of the visit -
and asks for further specification by his “yes…?”

Figure 3b. Example of lack of information about the “state of the world” for SDRT

The interpreter understood correctly the meaning of the “yes” (π5) and answers with Β06.
Here another type of rule has to be introduced: it is known that, in general, a “but” implies a
contrast with an implicit default consequence belonging to a former part of the discourse. In
the example, the implicit consequence is that - in fact - the Kosovar civilian should already
have been payed. To able to “compute” this implicit consequence some other contextual rules
must be introduced ; such as “ If a person who was involved in an accident, was in the
right,(s)he has to be reimbursed for the repair of the damage”.
In the example the interpreter leaves the above stated idea implicit by giving the motive of his
visit “but he did not get the money yet” (π6) as an answer to π5 which leaves also the object
implicit.
As a result SDRT produces the following nested structure (Figure 3c).

The thing which guarantees that we end up with the representation shown in Figure 3c is that
this solution - among all possible meaningful alternatives - maximises the coherence between
the already elaborated structure and the information to be added information. In other words
each sentence is linked by at minimum one relation to the discourse structure and resolves
simultaneously a maximum of ambiguities.

Under ideal circumstances such a structure represents :


- what the parties are saying
- what their intentions are at the communicative level
- what is reached by saying what they said.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
410

Figure 3c. Example of a contextual rule about the state of the world for SDRT.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
411

Conclusion

Communication abilities are a critical operational issue at all levels of the organisation.
To optimise performance in operations, people must thus be trained in.

A tool implemented on computer allows for training anytime anywhere.

HPSG and MRS do not pose fundamental modelling problems.

Analysis with SDRT is only possible if an effective system of rules and axioms about the state
of the world for the problem a hand exists.
These rules and axioms have to be derived from the analysis of empirical data. This means in
the context of our project the analysis of a number of “real life” negotiations and/or good
simulations/role playings which have been recorded.
The “theory” about the state of the world is than build step by step and adjusted when needed
to fit with new data.
This is precisely what we will start in the coming months: recording “input” through role
playing games at the Royal Higher Defence Institute because we experienced a number of
unbridgeable difficulties with the real life data in Kosovo; especially due to the language
problems and the use of a third party, the interpreter.
The next steps are then deriving the rules and axioms, followed by implementing them in an
algorithm to build an SDRT from a series of MRS produced by LKB. Stated otherwise, we
have to create a “glue language” which allows for hooking a sentence somewhere to the
already elaborated structure.

References :

Asher, N. & A. Lascarides (2003). Logics of Conversation. Cambridge University Press.


Cambridge

Copestake, A., D. Flickinger, I. A. Sag, & C. Pollard (1999). Minimal Recursion Semantics:
An Introduction. CSLI, Stanford University. http://www-csli.stanford.edu/
aac/papers.html

Copestake, A., J. Carroll, R. Malouf, & S. Oepen. (2000). The (new) LKB system. CSLI,
Stanford University. http://wwwcsli.stanford.edu/\Delta aac/doc5-2.pdf)

Pollard, C. & Sag, I. (1994). Head-Driven Phrase Structure Grammar. CSLI Publications.
Stanford.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
412

ELECTRONIC ADVANCEMENT EXAMS –


TRANSITIONING FROM PAPER-BASED TO ELECTRONIC FORMAT
Kirk Schultz, Robert Sapp and Larry Willers
Naval Education and Training Professional Development and Technology Center (NETPDTC)
Pensacola, Florida, USA, 32509
lee.schultz@navy.mil; robert.d.sapp@navy.mil; larry.willers@navy.mil

ABSTRACT

The U. S. Navy’s increased emphasis on Human Performance is changing the way


Sailors are trained. It is important that the Advancement Exam process leverages continuing
technological progress to better assess an individual Sailor’s performance in an accurate and
meaningful manner. This paper reviews the current status of the Electronic Advancement Exam
Initiative, which capitalizes on the ability to use multimedia to better present questions with a
performance orientation. The initiative’s goals and objectives, design and procurement
decisions, content development methodology, and considerations for integration with the current
exam process are addressed.

INTRODUCTION

The transition process from a paper-based to an electronic format for U. S. Navy Enlisted
Advancement Exams was originally presented by Baisden and Rymsza (2002), who conducted
Phase I of the initiative and established the viability of replacing the existing paper-based exam
with an electronic, multimedia format. This paper deals with Phase II issues, which included:
• addressing organizational culture issues associated with Advancement Exams
• reviewing and addressing internal/external process issues related to Advancement Exams
• defining and procuring specific hardware solutions for electronic exam implementation
• integrating the electronic exam software solution with existing database resources
• establishing standards and processes for developing multimedia assessment items

ORGANIZATIONAL CULTURE

The first major hurdle faced was the issue of change management. Navy paper and
pencil testing has been around for over fifty years. To move from a validated process to one
entailing a major shift in exam design (increased performance orientation using multimedia), as
well as a different presentation mode (electronic) left many skeptical. Three factors have had a
significant impact on the success of the initiative to date.
First, a motivated and capable implementation Advanced Exam Development Team
(hereafter referred to as the “Team”) was established. To conduct Phase II, three lead Team
members were selected to research and implement the change. In an effort to ensure an objective
look at implementation options, two of the three were brought in from outside the Advancement

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
413

Exam Center to work with the project full-time. The third lead member was a relative newcomer
to the Center and dedicated half-time to the project. Their backgrounds included engineering,
computer programming, instructional design, project management, procurement, and human
performance expertise. Project responsibilities fell into three general areas - hardware, software
and content – with one lead member responsible for each. Process issues tended to cross all
three of these general areas and were addressed both by individuals and the Team as a whole.
The lead Team tapped expertise from other departments as required for short-term guidance and
assistance.
Second, every effort was made to achieve a series of rapid successes in small, but critical,
elements of the electronic exam process to prove the concept, gain acceptance, build momentum
and establish ultimate success as a realistic possibility. As Phase II was initiated, a number of
individuals expressed concern and voiced their reasons why this initiative would fail. The Team
moved quickly to review and select hardware and software that, out of the box, addressed key
questions people had about the viability of creating and delivering electronic exams. With Team
coordination and assistance from the Exam Development Software (EDS) programmer, required
modifications that were anticipated by some to take three to six months were achieved to an 80
percent level in two weeks, with a prototype ready for implementation in less than one month.
SME content developers moved quickly to generate sample multimedia exam questions that were
performance-oriented and illustrated content that would have been difficult or impossible to
convey through text alone. Key employees and persons of influence within and associated with
the Navy Advancement Center were kept informed of progress through regular meetings held
every two weeks. The short time frame within which these successes were achieved started to
move the organizational culture from a “This will never happen” position to “This initiative is
going to happen and we need to be a part of it.”
Finally, efforts were undertaken to brief higher-level echelons on the plan and solicit
support. Tying this initiative to the Navy’s current Revolution in Training with its emphasis on
human performance was key. The same infrastructure that will allow Navy members electronic
access to elements of their career progression can be used to provide Navy advancement testing
in an all-electronic format. While implementation of this access is not expected for some time, it
is important to work out the details of the electronic advancement testing process now, so that it
can be expanded as soon as the infrastructure is in place.

PROCESS ISSUES

The United States Navy prides itself on its track record of consistently creating and
administering examinations to rank order enlisted Sailors for advancement in the grades E4-E7.
To smooth the transition, Phase II processes adhered to the existing paper-based processes to the
greatest extent practicable. Review of the processes currently used for the paper and pencil
examination process indicated that only minor modification was needed to accommodate the
electronic testing process. For example, electronic testing will utilize existing Education
Services Officers (ESOs), who will administer the exams in a manner consistent with the current
paper and pencil process. Where variations between the processes exist (like charging computer
batteries the night before the exam), these additional responsibilities will be clearly
communicated. Following the exam, in non-networked environments the ESO will express-mail

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
414

digitized answer files (saved to Secure Digital (SD) cards) back to the Center, in a manner
similar to the way completed machine-scoreable answer sheets are sent back now. These answer
files will be formatted so the results can be processed and item statistics evaluated without
requiring any modification to current procedures.

HARDWARE ISSUES

Hardware Platform Selection

Any shift to an electronic advancement exam process would be predicated on the


availability of a sufficient number of appropriately located and configured computing devices to
support delivery of the exams. A fundamental issue addressed by the Team was that an
insufficient quantity of suitable computers currently exists in the fleet. Since this meant that
equipment procurement was required, and because there have been advances in technology and
decreases in equipment costs in the year since Baisden and Rymsza (2002) performed their
original Phase I study, taking another look at the suitability of various devices for use in exam
delivery was justified.
At the start of Phase II, the latest models of Personal Digital Assistant (PDA) were
reviewed. A major concern remained that, in delivering multimedia questions, the small screen
size would require the test taker to toggle between two screens or scroll to see the content of each
question. Also, testing to determine maximum useful battery life indicated an operating duration
of just slightly over two hours - insufficient to deliver a three-hour exam. The PDA was
therefore eliminated as a delivery vehicle for the foreseeable future.
Another potential exam delivery platform evaluated in the Phase I effort was a tablet
computer. The recent release of Windows XP Tablet Edition, with its integrated support for
stylus operation and handwriting recognition, raised the possibility of delivering exams in a
manner that closely replicated the “pencil and paper” approach so familiar to Sailors.
Additionally, the extended screen height of tablet computers permitted questions with larger
content (like graphic alternative choices) to be viewed on-screen without requiring the user to
scroll. Evaluation of the Electrovaya Scribbler indicated that it has sufficient battery life to
deliver a three-hour exam with an adequate reserve capacity. However, at almost $2,800 per
unit, this unit was determined to be too costly to acquire in the quantity necessary to support
deployment of an electronic exam system.
While brief consideration was given to the idea of using desktop PCs due to the very
aggressive pricing available (typically half the cost of a similarly equipped notebook computer),
shipping costs and other logistical issues associated with deploying desktop systems and
returning them after the exam cycle left notebooks as the most promising option. Management
determined that, due to budgetary constraints, $1,000 was the target price for the initial Phase II
transition systems. After surveying the offerings of the major computer vendors with whom the
government has purchasing agreements, the Gateway M305E notebook computer was selected.
System testing identified that, when outfitted with a high-capacity (4400 mAh) lithium-ion
battery, the system delivered a 3 hour and 40 minute battery life when running BWS
BatteryMark (Version 1.0) set to simulate usage similar to that expected when delivering an

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
415

exam. Gateway was able to provide these systems, configured to Phase II specifications, at close
to the target price.

Hardware Configuration Considerations

In evaluating and selecting a suitable hardware platform, decisions had to be made


regarding the specific configuration that would best support the delivery and administration of
the examinations. At the same time, the processes developed must place the fewest possible
demands on the exam proctors, as they likely will have little familiarity with the exam software
and may not have extensive computer experience. After careful consideration of a host of
options, the Phase II exam deployment strategy selected provides the exam on separate media, to
be plugged into the exam station at the time of the test. While various digital media were
analyzed, Secure Digital (SD) cards were ultimately selected for their economy, wide availability
and broad industry support. The Gateway M305E has an internal digital media reader accepting
SD cards. Using the SD card addressed three primary processes. First, it provided a way to get
the exam to the Sailor initially, and also established a fallback process that can be used if a Sailor
misses an exam and needs a makeup or alternate test. A second SD card with a new test can be
sent, rather than a new notebook computer. Second, it provided a way to get the exam back from
the Sailor. The concept of having exam proctors download answer files and e-mail them back to
the Center is attractive, but represents a significant change from current practice and presents
special challenges for exams with classified content. When SD cards are used to deliver the
exam, the results can be written back to the card, providing the administering activities with a
removable component that can be express-mailed back to the Navy Advancement Center in a
manner similar to that used for paper answer sheets. Finally, providing the test on SD cards
reduces the classification security storage concerns by allowing the computers to be shipped as
unclassified devices. Only the SD cards have to be stored in a classified safe prior to the day of
the exam. The computer systems themselves would only become classified once the exam was
inserted into the machine at test time, reducing the time the computers were classified devices
from several weeks to a few days. Classified storage issues could possibly be avoided
completely if the administering activities repackaged and returned the computer systems to the
Center immediately upon conclusion of the exam. If that is not possible, one solution is to
remove the computer hard drives and store them separately. This returns the computers to an
unclassified status free from security controls and reduces physical storage requirements.

SOFTWARE ISSUES

In Phase I, Perception for Windows™ by Questionmark™ was selected as the authoring


tool and exam delivery vehicle, “primarily because of its widespread commercial use and its
success in Navy training for end-of-course testing” (Baisden and Rymsza, 2002). As in the case
of the hardware, Phase II began with a review of available assessment software to determine
whether significant advances had occurred since Phase I. Two software packages were
determined to be the leading candidates: OutStart Evolution® (a Learning Content Management
System (LCMS) that has been adopted by the Naval Education and Training Command (NETC)
as the primary content development tool for use with Navy E-Learning) and, once again,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
416

Questionmark™ Perception™. While both software packages have extensive capabilities and
features, only Perception™ provides: a question bar indicating which questions have been
answered, a method to flag questions for review, a user prompt if any questions are still
unanswered when the exam is submitted, a way to turn off the display of the final score, and
encryption for delivered questions. Additionally, Perception™ has the capability of importing
XML (eXtensible Markup Language) files conforming to the IMS QTI Specifications (see IMS
Global Learning Consortium, Inc., 2000). At the time of this review there do not appear to be
any benefits to using Evolution® instead of Perception™ for assessments. For these reasons,
Questionmark™ Perception™ was retained as the assessment development and delivery software.
Questionmark™ Perception™ permits questions to be delivered in either a scrolling format
or a question-by-question format. As in Phase I, Phase II adopted the question-by-question
format. However, the question presentation template was modified to display a security banner
at the top of the screen, reduce the height of the button bar at the bottom of the screen, reduce the
font size, and eliminate the timer to permit more content to appear on the screen. This reduces
the need for the user to scroll to see all of the question content.
Subject Matter Experts use an in-house developed and maintained Exam Development
Software (EDS) to generate exam questions and create paper-based advancement exams. To
date, paper-based exams have made only limited use of black-and-white graphics. These
graphics can be associated with the question stem, but not the question alternative choices.
Moving to an electronic exam introduces the capability of including audio, video and interactive
graphics in various parts of exam question content, so EDS has been modified to use of these file
types.
Taking advantage of the Perception™ feature that permits importing content conforming
to IMS QTI Specifications, EDS now has the capability of generating an XML file that can be
seamlessly imported into Perception™. All required media files are copied from the EDS master
resource folder into an exam-specific folder to expedite packaging for final delivery. This allows
EDS to remain the single source for all question generation and editing. EDS now also produces
a “preview” XML file that uses an XSL (eXtensible Stylesheet Language) file to format the
questions so the SMEs can use Microsoft® Internet Explorer to rapidly review all of the exam
questions in a scrolling format. This reduces the learning curve, cost, and time associated with
individual SMEs using Perception™ for this purpose.
Finally, as part of the advancement exam process, the exam taker is required to provide
additional organizational and personal information. In Phase II, the initial server-based version
used Perception™ to perform this function, similar to what was done in Phase I. However, using
a separate Active Server Page or Visual Basic® program permits more flexible screen formatting,
provides more control over how the information is manipulated, and is less time consuming to
set up and maintain as enterprise deployment is implemented. A separate program will be used
after the exam has been taken to extract the required data from stand-alone or server processes
and format it for analysis and statistical processing.

ASSESSMENT DEVELOPMENT CONSIDERATIONS

Navy Advancement Exams consist of 200 4-alternative (4-alt) multiple-choice questions.


Question development is governed by a stringent set of guidelines and the quality of the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
417

questions is monitored statistically based on by-item exam results. In selecting the first ratings to
transition to electronic exams, consideration was given to several factors: size of rate/rating,
demographic distribution of exam candidates, classification of exam, appropriateness of
multimedia for use with the rate/rating content, status of existing multimedia materials, number
of exam bank control items, willingness of Subject Matter Expert (SME) exam writer to work on
an electronic exam, marketing impact a successful exam would have, and exam candidate's
anticipated familiarization with computers. The two rates selected, AG3 (Aerographer’s Mate,
Third Class) and STG3 (Sonar Technician - Surface, Third Class), successfully met the desired
criteria. Successfully distributing exams to these two communities will address key issues
common to the broad spectrum of Sailors in various ratings across the Navy. In addition to
occupationally-oriented content, Professional Military Knowledge (PMK) questions common to
all rating exams were developed that used multimedia to better illustrate job performance. For
the initial test development, between 35-45 percent of the questions used multimedia.
A major concern is the increased time it takes to create a multimedia-based exam item.
Approximately 75 percent of the multimedia did not exist. Required animations were created
using vector graphics and photos/videos were obtained from photo shoots. For the remainder,
existing media needed to be modified. It took an average of twenty times longer to create a
multimedia-based exam item than it did to create a simple text-based one. Only government-
created multimedia resources were used to avoid the possibility of any copyright violations.

CONCLUSION

The implementation of electronic exams is no longer a question of “if,” but “when.”


Phase II is off to a good start, but there is still much work to be done. Ongoing technological
advances in hardware and software will need to be monitored so that they can be leveraged to
enhance this project. Required changes in organizational and fleet culture will need additional
consideration and careful implementation. But the solutions adopted to date have been designed
to be scaleable with a minimum of alteration. The efforts invested through these initial stages of
Phase II provide numerous lessons learned of benefit to other organizations making a similar
transition.

REFERENCES

Baisden, A., & Rymsza, B. (2002). New directions in navy assessment: Developing a multimedia
prototype. In 44th Annual Conference of the International Military Testing Association
Proceedings (22-24 October 2002, Ottawa, Canada).
http://www.internationalmta.org/2002/02IMTAproceedings.pdf (2003, October 7).
IMS Global Learning Consortium, Inc. (2000). IMS question & test interoperability
specification: A review (IMSWP-1 Version A, 10th October, 2000).
http://www.imsglobal.org/question/whitepaper.pdf (2003, October 23).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
418

Integrated Web Assessment Solutions

David T. Pfenninger, Ph.D., Reid E. Klion, Ph.D., and Marc U. Wenzel, Ph.D.
Pan, Inc.
111 Congressional Blvd., Suite 120
Carmel, IN USA 46032
david@pantesting.com

One of the most exciting and practical new technological developments for psychologists and
other test administration professionals is the incorporation of psychological and behavioral
assessment tools into efficient and secure Internet-based platforms.

Web-based assessment holds the promise of improved performance and efficiencies of scale for
practitioners and researchers alike, and by extension, a potentially superior methodology for their
clients. However, this potential is counterbalanced by special challenges attendant to the web
delivery medium itself, as well as by the dynamics of both a methodology (test administration
and processing) and market (testing, assessment, survey, and collateral measurement domains) in
transition.

This paper will review some of the recent developments and emerging trends and practices in the
world of integrated web assessment to help orient test users and consumers to this new tool, its
promise, and its challenges.

Why Web Assessment?

The rise of web assessment (aka “e-testing”) appears to be inevitable. Indeed, the director of the
Association of Test Publishers stated unequivocally that the “…emphasis on e-testing means that
current test and assessment tools will probably be replaced by electronic versions or by new
tests with exclusive online use….” (Harris, 1999).

Similarly, influential psychological researchers Patrick DeLeon, Leigh Jerome and their
colleagues (1999) have observed that “…behavioral telehealth is emerging as a clinically
important and financially viable option for psychologists…the Internet is particularly well-suited
for self-report questionnaires and surveys.”

Web assessment and testing is a variant of Computer Based Testing (CBT), leveraging Internet
content delivery and data transmission with the goals of creating greater client access and faster
processing of data. The key advantages of CBT are preserved, possibly accentuated, in the e-
testing variant:

1. Psychologists are comfortable using computer-based test administration, scoring and


electronic report generation, with 85% having done so (McMinn, Ellens, & Soref, 1999).
2. Computerized administration and scoring of tests have become a generalized practice
(Silzer & Jeanneret, 1998).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
419

3. Computerized testing has been shown to save significant clinician and patient time
(Yokley, Coleman, and Yates, 1990).
4. Researchers have found in controlled studies that hand-scored personality tests by trained
scorers results in 53% of profiles containing errors, 19% significant enough to affect a
diagnosis (Allard, Butler, Faust, & Shea, 1995). Similar scoring inaccuracies with
scanner technology are currently plaguing the standardized educational testing market via
tort suits filed by plaintiffs alleging harm from such errors.
5. Psychometric research of computerized tests yields conclusive support for test
characteristics of stability and validity (Alexander and Davidoff, 1990).
6. Computerized testing provides a highly standardized and uniform process not influenced
by examiner characteristics (Lewis, Pelosi, et al, 1988).
7. Computerized testing formats are judged acceptable by patients, rated by patients as easy
to use, and apparently are preferred to conventional paper-pencil and interactive testing in
perceived comfort (Campbell, Rohlman, et al, 1999; Navaline, Snider, et al, 1994;
Pisoneault, 1996; Telwall, 2000).
8. Test-takers tend to divulge more information to a computer test module than to human
examiners (Hart & Goldstein, 1985; Malcom, Sturgis et al, 1989).

Test Quality and Method Variance Issues

Web assessment and e-testing sites and programs vary widely in terms of the information they
provide to help assess the quality of the available instruments. Many sites claim to have
“reliable and valid” tests; as is the usual practice, practitioners should verify these claims with
data. Some sites offer online manuals, psychometric white papers, and other information
allowing the same evaluations psychologists have done with paper-and-pencil assessments.

Web assessment per se does not appear to have significant impact upon validity or reliability
considerations for most kinds of testing, although commensurability and method variance studies
are in their infancy for more complex stimulus presentations. The following is a brief review of
extant sources of commensurability or method variance study involving computer or web
assessment.

The Mead & Drasgow (1993) meta-analysis of cognitive ability test equivalency studies found
average cross-mode correlations of .97 for power tests and .72 for speeded tests, a remarkable
level of method equivalence that sets a formidable bar to those who would challenge equivalence
of methods.

There are more recent examples relative to the form of CBT referred to here as “e-testing” or
“web-based”, “internet-based” or “online” testing or assessment. For example, the Canadian
Public Service Commission Personnel Psychology Centre (Chiocchio, et.al, 2003) has executed
method variance studies comparing paper/pencil vs. online versions of several timed cognitive
skills tasks, including performance measures of reading ability, numerical ability, and visual
pattern analysis ability. They found virtually no method variance and concluded that the forms
were functionally equivalent and require no correction factor.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
420

Weiss (2001) reviewed literature and presented design considerations and outcome summaries
for several in-house method variance studies, and concluded that method variance effects even
for perceptual-response based tests tend to be trivial in most cases. His group’s own studies with
the Beck Scales (paper/pencil vs. online) yielded equivalence with no correction factor,
suggesting “there is ample evidence that computer administration of clinical personality tests,
such as the Beck Depression Inventory, are comparable with paper administration when certain
essential design elements are carefully considered.”

Other researchers have found similar results. Biggerstaff, Blower, and Portman (1996) found
equivalence of paper-pencil and computerized versions of the Aviation Selection Test Battery
without the need for score transformations, while Neuman & Baydoun (1998), Clark (2000), and
Donovan et. al (2000) all conducted comparisons of paper/pencil vs. CBT for personality
questionnaires, finding strong equivalency.

Potosky and Pobko (1997) provided support for equivalence of non-cognitive computerized
versions of paper/pencil-normed tests in a personnel selection program. Smith & Leigh (1997)
found online and paper versions of a clinical questionnaire to be comparable. Coffee and
colleagues (2000) reported on web-base civil service testing with equivalent forms readily
established.

Burnkrant & Taylor (2001) presented results of these three methods variance studies and suggest,
in line with previous work...that “data collected over the internet and in a traditional paper-and-
pencil setting are largely equivalent" (p.5). Weiner & Gibson (2002) presented several studies on
the PSI Employee Aptitude Survey cognitive ability tests, ultimately concluding “web-and paper-
based battery scores were found to be highly equivalent.”

Thus, a growing corpus of available research finds paper-pencil and computer and online-based
versions of various cognitive ability (both speeded and power), personality tests, and rating
scales to be de facto equivalent.

If there is a method difference established, one should identify and enter a correction factor into
the scoring algorithm for the web-version and slowly retires the paper version if the online test is
to use the same norms that were the basis of the paper/pencil test. Of course, many tests are now
in development without regard for paper/pencil administration at all, based upon computerized-
web administration from the outset.

Based upon the authors’ experiences in developing over 400 online testing products for more
than 40 commercial test publishers and a dozen major proprietary corporate or government
testing programs, the following tentative conclusions may be offered:

1. The consensus in the commercial test and measurement industry is to assume equivalence
of forms as a general baseline heuristic. This seems to be accepted by the firms as one of
the clearest results in the aggregated history of scientific method variance study, a result
that extends across test content (clinical, I-O) domains.
2. Publishers and users by and large consider CBT and internet or e-testing equivalent.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
421

3. Most publishers have chosen not to execute method variance studies for their online tests,
especially from CBT-to-e-testing conversions.
4. For those publishers executing method variance reviews of online products, the findings
have been fairly uniformly one of demonstrable equivalence, even across power and
cognitive tests.
5. Some publishers are presenting a caveat emptor notification to the test administrator prior
to purchase if method variance studies have not been completed on a product formerly
available only in paper/pencil.

Factors Driving Web Assessment

These psychometric and ergonomic considerations aside, the main reason for the advent of web
assessment or e-testing is that test publishers and users perceive cost savings and revenue
enhancement potential with the new format. The new e-testing product offerings are beginning to
have an impact on the test industry similar to what has occurred in recent years in online retailing
within the book-selling industry, delivery of e-learning content in the training industry, and e-
cruiting in the recruitment industry.

Web assessment allows for continued test and item refinement by allowing for aggregation of
anonymous raw data that in the past had eluded the test developer, which should make the
development or refinement of test products much faster and less labor-intensive. Further, with
improved document control technologies, publishers can increasingly control their test items
against unlawful copying and copyright violations.

Because e-testing has the potential to attenuate many of the costs traditionally associated with
paper-pencil and desktop-diskette modes of testing, publishers may enjoy better margins without
appreciably raising prices. Some cost-comparison data has demonstrated substantial cost
reductions (Chiocchio, et. al, 2003) and no doubt this area will receive increased scrutiny in
coming years. The now well-established capabilities of instant ordering access, enhanced remote
data harvesting capability, bundled products (i.e., no separate purchases of questionnaires,
answer sheets, scoring templates, shipping, etc), reduced or eliminated scoring and clerical
requirements, and immediately produced results are often posited as valuable improvements for
testing consumers. Thus, web assessment appears to have the promise of an enhanced delivery
solution of value to publishers, test professionals, and ultimately, the test-taking client.

That the test-taking subject should derive benefit from the online paradigm is evidenced by the
rationale of the Canadian Public Service Commission in developing their online pre-employment
testing programs: namely, that the citizenry of Canada would enjoy enhanced access to
employment opportunities by making pre-screening more wide accessible via the web delivery
model (Chiocchio, et. al, 2003). Readers are also encouraged to refer to the proposed Association
of Test Publishers guidelines for web-based e-testing (Harris, 2000) for more information on
evolving standards.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
422

Integrated Web Assessment Solutions Model

The “web services” variant of the “business process management” model underlies most web
testing platforms. It is essentially the accessing of web-based software on a “plug-in/plug-out”
basis in lieu of buying and installing PC-based programs. The Internet becomes the software
dispensary. Because the application is fully web-based, the system is constantly upgraded and
improved with no need to buy or install new versions of software. Moreover, true “process
integration” becomes feasible due to the specific features of the Internet in terms of efficient data
linkage and delivery.

The concepts of business process management and integration have attracted considerable
attention for the past several years. The intent of such models is to create a fully integrated
system from the previously disparate aspects of a business process. The goal is to share
information across all aspects of the process as well to monitor and manage the entire system
efficiently. An example of such a system would be a consumer’s purchase at a checkout counter
triggering a transfer of information to the store’s local inventory control system (so the manager
knows when an item needs to be re-stocked), to the corporate office (so that the marketing
department receives feedback on store performance), and to the supplier of the purchased
product (so that a new shipment can directed to the store when it is next needed).

Any standard web browser provides access to the integrated web assessment for both
administrator and client on a 24/7 basis. Hence, clients no longer have to bother purchasing
diskettes for every computer they wish to use for testing, suffer through painful installations and
complex manuals, or incur information technology resource drains. The web services model
preserves the best of CBT while allowing the practitioner to log into an online assessment office
virtually anywhere in the world, on any web-connected machine (and the same flexibility is
extended to the client).

Further, aggregate data is easily made available in standard database formats that can easily
interface with other software applications. For example, XML and XSL data transfers serve as
bridges for communication between often distributed or disparate 3rd party applications. The
formerly isolated applications become “integrated” into a coherent system. The client gains new
functional value by streamlining the exchange and flow of relevant client data (including test
data), and this data in fact becomes more “dynamic” by virtue of its ability to trigger other
processes or decision points within the system (Klion, R., et. al, 2003).

Security

Compared to the problems of keeping paper and pencil test reports secure, advanced e-testing
systems can be far more secure than traditional paper and pencil testing (Bartram, 2000). When
reviewing e-testing sites, consumers should take the time review the security and privacy
statements and endorsements (such as Verisign™). For the highest privacy standards, sites should
only use temporary cookies to enable page turning and other basic web navigation functions, but

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
423

should not house permanent cookies (cookies are small identifier programs that leave a copy on
the user’s machine so that the machine may be “identified” when it returns to the web site).

Access to secure, restricted instruments should be conducted in the same manner traditional
paper-and-pencil publishers determine access – by verifying professional status and/or
educational experiences. Interactions with the system should be encrypted via current Secure
Sockets Layer (SSL) technology, and security statements on the site should indicate how patient
or client data is stored and transmitted.

Sites vary widely with report handling. Some routinely emailed reports to the psychologist. This
method is subject to hacking and interception and is not recommended. The best method is
secure, SSL delivery to an ID and password-controlled on-line testing “office.” Some systems
can “lock down” the testing machine to prevent other functional programs, and still other
systems are capable of delivering online tests to identified computers (using IP address or other
identifier). These methodologies offer possibly the most secure format for test data transfer yet
available. Reports in easily editable formats (e.g., MSWord™ or HTML) should be treated with
caution, as results are more easily altered than, for example, delivery of a closed PDF-type file.

Well-designed and maintained e-testing systems appear to offer advanced security, a point
clearly made by researchers:

“A well-designed Internet system can be far more secure than many local computer
network systems or intranets, and certainly more secure than paper and filing
cabinets….In most areas, Internet-based assessment provides the potential for higher
levels of control and security…Test scores are also potentially far more secure.”
(Bartram, 2000, p. 269).

Users of web assessment should expect state-of-the-art concern and methods for protecting the
integrity of tests consistent with guidelines of the American Psychological Association (1999a;
1999b). In fact, the bundled nature of many online testing formats may provide enhanced
protections of test information from abuses related to HIPPA or other nascent disclosure
mandates.

Future trends and developments

Web assessment is already a significant modality for tests and measures in the business and
government testing arena and is gaining momentum in educational and certification testing
markets as well, and we can anticipate that e-testing will be the norm in a very short time,
probably within three to five years (Pfenninger, 2000; 2002). The staggering increase in
classroom connectivity (cf., Fantasma Networks, 2001), reduction in the “digital divide” (Sailer,
2001), and the previously described economic “return-on-investment” factors all but guarantee
this outcome.

The next trends in web assessment and e-testing will involve globalization, as the medium
naturally lends itself to international distribution and access. Local norming and translation

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
424

issues will be increasingly important and controversial constructs as web assessment testing now
easily transverses national boundaries (Klion, et. al, 2002).

Delivery of media-rich assessments is an emerging mega-trend (Jones & Higgins, 2001) awaiting
fuller implementation of broadband delivery capability. High quality audio and video delivery
will be able to present complex assessment stimuli as “virtual” realities, allowing for an
advanced form of simulation-based evaluation.

Meanwhile, the advent of reliable, low-cost Internet devices is upon us. Many hospital and health
care systems are already using hand-held thin clients as the basic data point-of-entry for
clinicians. Similar devices are being used in corporate surveys, and it is probably inevitable that
most large-scale certification and standardized educational testing will eventually utilize thin
client Internet devices as the data harvesting hardware.

Case example

The U.S. Transportation Security Administration utilizes a very rigorous skills-based standards
framework for job role definition, identification of skills and competencies, and the alignment of
selection and assessment strategies within this model (Figure 1, from Kolmstetter, 2003). The
project is among the largest web assessment-based testing programs ever executed and involves
close coordination with TSA and a host of other vendors in the hiring and human resource
management of a 75,000 employee federal agency.

Figure 1. TSA Skill Standards – Integrated Web Assessment Model

Advertise, Recruit,
Provide Applicant
Information

Performance Appraisal Qualifications Screen


& Management

Skill Competency Assessment


Standards
Certification-
Annual
Proficiency
Medical Examination
Review

Background/Security Check

Train

Hire

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
425

Figure 2. Five process model per integrated panel

Function: Function: Function: Function:


Biodata Form Online Proctored Testing Check-in at test center Assessment Battery
Scheduler

Interface: Interface: Interface: Interface:


Online Biodata form Online scheduling system Online check-in form Online test
completed at applicant’s for Proctored Test Center
home

Data Input: Provided by Data Input: Data Input: Data Input:


applicant Provided by applicant Provided by proctor based upon Provided by applicant
applicant documentation

Decision Analysis: If Index Decision Analysis: Decision Analysis: Decision Analysis:


> 70, qualify None needed Meet documentation criteria? Qualified if Math>19 and
English>20

Communication: Communication: Communication: Communication:


1) Qualifiers: Send link to 1) Send e-mail to applicant 1) Applicant is seated to begin 1) Qualifiers: Send link to
Online Proctored Testing confirming appointment testing Physical Exam Scheduler
Scheduler 2) Send data to XYZ 2) Non-qualifiers: Invited to 2) Non-qualifiers: Send
2) Non-qualifiers: Send e- 3) Send data to MNO return tomorrow with correct e-mail regrets
mail regrets paperwork

The TSA demand articulated in their contract for “integrated services” highlights the new XML-
driven “web services” models and reflect the state-of-the-art in technology-based global
assessment with systemic, data flow, and content integration capabilities.

A solution was needed to manage all aspects of computerized pre-employment test delivery,
including design and development of Java testing applets, proctored testing centers in over 450
sites across the United States, provision of remote testing facilities in overseas territories, and
maintenance of a complex stream of transactions and .xml data exchanges among multiple
vendors.

First, an integrated process was developed in which data collected at one phase were used to
drive subsequent aspects of the process. For example, results of the test battery were used to
generate a summary report and individualized interview questions for use in the employment
interview. Second, it enabled the use of automated communications as well as a variety web-
based tools to manage the entire process.

Across this multi-step/multi-vendor vetting process, there are five functions that occur regularly
at each phase (“panel,” “module”): Function; Interface; Data Input; Decision Analysis;

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
426

Communication. See Figure 2 for a redacted extract illustrating the five processes for some
typical sequential panels in the integrated web solution for TSA.

The integrated solution is comprised of the following features:

1. Integration of multi-partner assessment and related HR content


2. Multi-hurdle select-out model adapted due to stringent hiring eligibility criteria
3. Content “travels” with electronic candidate record through the sequence
4. Aggregate content data posted to TSA for analysis
5. Seamless integration for total candidate experience
6. Import of 3rd party assessment data and exports to required databases and applications
7. Dynamic status moves triggered by data decision rules

TSA uses a display console for data management and real-time review of candidates passing
through the various steps in this integrated selection and assessment solution. See Figure 3 for an
example of a typical display:

Figure 3. Data management display tracking candidate assessment information.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
427

Results

As of October 2003, the TSA has deployed the integrated web solution with few difficulties,
with over 30,000 test batteries completed and over 7,000 candidates completing the second phase
assessment center procedures.

Summary

Business testing for selection and development is no longer done in isolation from other practices
and applications, and opportunities in corporate testing are increasingly tied to the ability to
integrate seamlessly with other systems.

The “best-of-breed” web services integration approach renders a solution characterized by


“emergent properties”; that is, the whole of the solution is greater than the sum of its parts, and
as such reflects a real innovation, that while portable in core respects, also is highly sensitive to
the client’s needs for customization. Such emergent properties avoid the generic drift of the
monolithic enterprise human capital processing installations, instead offering plug-in capability,
focused support, and design collaboration with (as opposed to imposition of design upon) the
client.

Testing providers who adapt their offerings to take advantage of web services integrations will
find a receptive audience among I-O psychologists and other responsible users of testing and
assessments.

References

Alexander, J. & Davidoff, D. (1990). Psychological testing, computers, and aging. International
Journal of Technology and Aging, 3, 47-56.

Allard, Butler, Faust, & Shea (1995). Errors in hand scoring objective personality tests: The case
of the Personality Diagnostic Questionnaire-Revised (PDQ-R). Professional Psychology:
Research and Practice, 26, 304-308.

American Psychological Association (1999). Test security: Protecting the integrity of tests.
American Psychologist, 54 (12), December 1999, p. 1078.

Bartram, D. (2000). Internet recruitment and selection: Kissing frogs to find princes.
International Journal of Selection and Assessment, 8 (4), December, 261-274.

Biggerstaff, S., Blower, D., and Portman, L. (1996). Equivalence of the computer-based aviation
selection test battery (ASTB). International Military Testing Association 38th Annual
Conference, 1996. Available at http://www.ijoa.org/imta96/paper47.html.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
428

Burnkrant, S., & Taylor, C. (April, 2001). Equivalence of Traditional and Internet-Based Data
Collection: Three multigroup analyses. Paper presented at 16th Annual Conference of the Society
for Industrial and Organizational Psychology, San Diego, CA.

Campbell, K., Rohlman, D., Anger, W., Kovera, C., Davis, K, & Grossmann, S. (1999). Test-
retest reliability of psychological and neurobehavioral tests self-administered by computer.
Assessment, Vol. 6, #1, 21-32.

Carr, A., Ancill, R., Ghosh, A. & Margo, A. (1981). Direct assessment of depression by
microcomputer. A feasibility study. Acta Psychiatrica Scandinavica, 64, 415-422.

Chiocchio, F., Degagnes, P., Kruidenier, B., Thibault, D., & Lalonde, S. (2003). Online Tests:
Phase II Implementation. Report to Stakeholders. Public Service Commission Canada -Personnel
Psychology Centre. http://www.pscfp.gc.ca/ppc/online_testing_pg02K_e.htm.

Clark, D. (2000). Evaluation of a networked self-testing program. Psychological Reports, 86,


127-128.

Donovan, M., Drasgow, F., & Probst, T. (2000). Does computerizing paper-and-pencil job
attitude scales make a difference? New IRT analyses offer insight. Journal of Applied
Psychology, 85, 305-313.

Duffy, J., & Waterton J. (1984). Under reporting of alcohol consumption in sample surveys: The
effect of computer interviewing in fieldwork. British Journal of Addiction, 79, 303-308.

Fantasma Networks (2001). Network classrooms of the future: An economic perspective.


Retrieved from http://www.fantasma.net, March.

Harris, W.G. (1999). Following the Money. White Paper: Association of Test Publishers,
Washington, DC. http://www.testpublishers.org.

Harris, W.G. (2000). Best practices in testing technology: Proposed computer-based testing
guidelines. Journal of e-Commerce and Psychology, 1(2), 23-35.

Hart, R., & Goldstein, M. (1985). Computer assisted psychological assessment. Computers in
Human Services, 1, 69-75.

Jerome, L., DeLeon, P., James, L., Folen, R., Earles, J., & Gedney, J. (2000). The coming of age
of telecommunications in psychological research and practice. American Psychologist, 55(4),
407-421.

Jones, J., & Higgins, K. (2001). Megatrends in personnel testing: A practitioner’s perspective.
Journal of Association of Test Publishers, January 2001. http://www.testpublishers.org.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
429

Klion, R., Pfenninger, D., Chiocchio, F. & Callender, J. (2003). Cross-Cultural Test Design for
Global Selection Programs. Presentation to Association of Test Publishers annual conference,
Orlando, FL.

Kolmstetter, E. (2003). The TSA story: Screener selection. Presented as part of Pfenninger, D.,
Kolmstetter, E., Davis, B., & Jung, P., Automating the Testing and Assessment Process.
Presentation, International Conference on Assessment Center Methods, Atlanta, GA.

Lewis, G., Pelosi, A., Glover, E., Wilkinson, G., Stansfeld, S., Williams, P., & Shepherd, M.
(1988). The development of a computerized assessment for minor psychiatric disorder.
Psychological Medicine, 18, 737-743.

Malcom, R., Sturgis, E., Anton, R., & Williams, L. (1989). Computer-assisted diagnosis of
alcoholism. Computers in Human Services, 5, 163-170.

Mead, A.D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive
ability tests: A meta-analysis. Psychological Bulletin, 114, 449-458.

McMinn, M., Ellens, B., & Soref, E. (1999). Ethical Perspectives and practice behaviors
involving computer-based test interpretation. Assessment, Vol. 6, #1, p. 74

Navaline, H., Snider, E., Petro, C., Tobin, D., Metzger, D., Alterman, A., & Woody, G. (1994).
Preparations for AIDS vaccine trials. An automated version of the Risk Assessment Battery:
Enhancing the assessment of risk behaviors. AIDS Research & Human Retroviruses, 10, Suppl.
2:S281-282.

Neuman, G., & Baydoun, R. (1998). Computerization of paper-and-pencil tests: When are they
equivalent? Applied Psychological Measurement, 22, 71-83.

Potosky, D., & Bobko, P. (1997). Computer versus paper-and-pencil administration mode and
response distortion in noncognitive selection tasks. Journal of Applied Psychology, 82, 293-299.

Pfenninger, D. (2000). e-testing: A new methodology for professional assessment. Paper


presented at the American Psychological Association annual conference, August.

Pfenninger, D. (2002). Remote clinical assessment: An introduction. Paper presented at


Association of Test Publishers annual conference, February.

Sailer, S. (2001) Analysis: The web’s true digital divide. United Press International. Retrieved
from http://www.vny.com/cf/news/upidetail.cfm?QID=203267.

Silzer, R., & Jeanneret, R. (1998). Anticipating the future: Assessment strategies for tomorrow.
In. R. Jeanneret & R. Silzer (Eds.) Individual psychological assessment: Predicting behaviors in
organizational settings (pp. 445-477). San Francisco: Jossey-Bass.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
430

Smith, W. A. & Leigh, B. (1997). Virtual subjects: Using the internet as an alternative source of
subjects and research environment. Behavior Research Methods, Instruments, and Computers,
29, 496-505.

Weiner, J. & Gibson, W. (2002). Transition to technology: Design and Application Issues with
Employment Tests. Paper presented at Association of Test Publishers annual conference,
February.

Weiss, L., & Trent, J. (2002). Equivalency analysis of paper-and-pencil and web based
assessments. Association of Test Publishers annual conference, February.

Yokley, J., Coleman, D., & Yates, B. (1990). Cost effectiveness of three child mental health
assessment methods: Computer-assisted assessment is effective and inexpensive. Journal of
Mental Health Administration, 17, 99-107.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
431

STREAMLINING OF THE NAVY ENLISTED


ADVANCEMENT NOTIFICATION SYSTEM
LCDR Tony Oropeza
Navy Advancement Planner OPNAV N132C4
Jim Hawthorne
PNCS Jim Seilhymer
Darlene Barrow
Joanne Balog
Navy Enlisted Advancement System, Pensacola, FL, USA

In February 2002, the major stakeholders of the Navy Enlisted Advancement System
(NEAS) convened with the goal of streamlining the Navy Enlisted Advancement Notification
(NEAN) process. Numerous ideas were shared and discussed among the stakeholders. In turn,
initiatives were delegated for action to various stakeholders. Many of the initiatives assigned to
the Naval Education and Training Professional Development and Technology Center
(NETPDTC) will be presented. These initiatives are:

• Auto-Ordering Exams and Bar-Coded Answer Sheets


• Rapid Advancement Planning Estimation Algorithm
• Accelerated NEAS Processing
• Internet Posting of Examination Results

Through the implementation of these and other initiatives, a significant reduction in the
notification process has been accomplished.

Prior to streamlining the notification process, the time period from exam day to publication
was 11 to 13 weeks. The target time period after streamlining was reduced to 5 weeks. Initially,
several constraints were implemented so there would be no negative impact on the Sailors, no
sacrifice to the quality of the exam, and no increase in the workload of the Fleet. The outcome
would result in improvements in quality of life, learning, retention, and a reduction in the
workload of the Fleet. Additional conferences were held in June 2002 and January 2003 for the
NEAS stakeholders to meet and discuss their progress and to further improve the plan of action.

Auto-Ordering Exams and Bar-Coded Answer Sheets

At the NEAN conference in February 2002, it was decided commands should have the ability
to order exams for candidates via the web, based on time in rate (TIR) eligibility according to the
Enlisted Master File. In turn, exams and bar-coded answer sheets would be printed and mailed
to commands (testing sites), determined by the TIR eligibility lists.
Providing a website where commands can review, change, delete, or add candidates has
several advantages. There has been a reduction in workload on testing officers. Testing officers

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
432

can spend less time figuring out how many exams to order and when to order. In the past, some
testing officers would either over-order, order too close to the deadline, or forget to order. Over-
ordering results in a waste of resources. Ordering too close to the deadline or forgetting to order
results in requiring mailing of substitute exams. Therefore, another advantage due to the latter
cases is reduction of substitute exams. Also, by producing a TIR eligibility list, commands will
be sent exams and answers sheets even if the list has not been reviewed. In turn, exams and
answers sheets will arrive promptly at the testing sites. Figure 1 is a snapshot of a TIR eligibility
list.

Figure 1. Time in rate eligibility list.

By bar-coding answer sheets, candidates are required to “bubble” in minimal entries on the
answer sheets. Moreover, a reduction in discrepancies is obtained by bar-coding the answer
sheets with pertinent information. If any of the bar-coded information is incorrect, the candidate
can “bubble” in the particular information, which will override the bar code.

Rapid Advancement Planning (ADPLAN) Estimation Algorithm

The ADPLAN estimation algorithm was developed to provide the manning planners a basic
idea of how many candidates within each exam rate would pass upcoming exams. The algorithm
was beta-tested for the March 2002 exam. The algorithm was of assistance to the manning
planners in projecting exam passers and advancement counts. The algorithm was implemented
for the September 2002 exam and was helpful in trimming two weeks off the advancement exam
process. It has been determined the ADPLAN algorithm will continue to be a part of the
advancement exam process.
The ADPLAN algorithm begins with projecting the number of all exam passers. It continues
by determining the number of passers by duty groups - Training and Administration of Reserves
(TARs), Active Duty (USN/R), and Canvasser Recruiters (CVRs). These numbers are totaled

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
433

and compared with the number found for total exam passers. The number of passers by
paygrade also is determined. The total number of passers by paygrade is compared to the total
passers by duty group and the total exam passers. The same procedure is performed for
projecting exam takers as well. Exam-taker projections are used by NETPDTC for scheduling
tasks.
An increase in exam takers was expected from 2002 to 2003. The number of projected takers
was 110,193. The actual number of exam takers was 112,056, which was a 10% increase from
the September 2002 advancement exam. In analyzing where the increase of exam takers actually
occurred, it was determined the majority of the increase came from paygrade E4. There were
1,640 more E4 exam takers than expected. Refer to figure 2.

M ARCH 2003 ACTUAL VERSUS PROJECTED EXAM


TAKERS BY PAYGRADE
65

60
PROJECTED E5 EXAM TAKERS
52,151
55
CANDIDATES ( In thousands )

E5
50
ACTUAL E5 EXAM TAKERS
52,067
45
E6

40 PROJECTED E6 EXAM TAKERS


30,763
35
ACTUAL E6 EXAM TAKERS
30 30,713

E4
25 ACTUAL E4 EXAM TAKERS
29,276

20
PROJECTED E4 EXAM TAKERS
27,656
15
1993 1994 1995 1996 1997 1998 1999 2001 2002 2003
MARCH EXAM CYCLES

Figure 2. Actual versus projected exam takers by paygrade for March 2003.

During the algorithm development it was found that starting in 2000 there was a significant
difference between the number of candidates taking the March exams and the September exams.
Due to this phenomenon, it was determined the algorithm would have to account for this
disparity. Past history for March exams would be used to predict the number of candidates
taking the upcoming March exam, and the same course of action would be used in predicting the
number of candidates taking September exams. This phenomenon is showing signs of becoming
less of a concern. Figure 3 shows the number of candidates taking March exams is once again
coming closer to the number of candidates taking September exams.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
434

T O T A L E XA M T A K ER S
150

M A R CH CY C LE S
140
S E P TE M B E R C Y C LE S

CANDIDATES (In thousands )


130

120

110

100

90

80
Sep-93

Mar-94

Sep-94

Mar-95

Sep-95

Mar-96

Sep-96

Mar-97

Sep-97

Mar-98

Sep-98

Mar-99

Sep-99

Mar-00

Sep-00

Mar-01

Sep-01

Mar-02

Sep-02

Mar-03
E XAM C YC L ES

Figure 3. Total exam takers in March and September cycles over the past 10 years.

As with any forecasting model, predictions will improve as the model is refined. Additional
resources have been found that will assist in determining increases and decreases in manning
numbers.

Accelerated NEAS Processing

As stated previously, the main goal of the NEAN conferences was to streamline and
accelerate the NEAS processing. Another decision was made to validate exam answer keys prior
to exam day rather than on exam day. Since time is a factor, any problem that can be resolved
without slowing down the advancement process is an advantage. Another important decision
was made to mandate mailing of examination answer sheets in the most expedient traceable
manner available, within 24 hours of administering advancement exams. This one process is the
main factor in slowing down publication of advancement results.
Several internal processes at NETPDTC were streamlined by rewriting many of the
processing programs. Prior to NEAN, all internal processing was done by administration cycle.
Now all internal processing up to advancement planning is done by paygrade. By doing this,
manning planners can have an early look at paygrade advancement planning.
Another process that has been refined is quota entry. To expedite advancement planning
quota entry, a web quota site was developed for the manning planners. The data at this site is
retrieved at NETPDTC and placed in the NEAS.
Prior to the NEAN initiative, exam cycles took from 11 to 13 weeks to process. A great deal
of the processing time was based on the receipt of answer sheets. Different stages of the
advancement process depend on a certain percentage of answer sheets being returned. The goal
for advancement planning is 90%. Additional factors can slow down the advancement process.
In September 2000, a necessary program change slowed down the process, in September 2001 it
was the 9/11 tragedy, and in March 2002 it was the invasion into Afghanistan.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
435

March 2002 marked the implementation of the NEAN initiatives. The NETPDTC processing
was reduced; however, other process changes had not been implemented. By September 2002,
many of the process changes had been implemented and the process time was greatly reduced.
By March 2003, the NEAN initiatives were complete and processing time was reduced, but
administration and mailing were affected by the onset of Iraqi Freedom. See figure 4.

ACCELERATED NEAS PROCESSING


ADMIN TO PUBLICATION
70
65
60
55 9 weeks
WORKING DAYS

50
45 NETPDTC
40 7 weeks
QUOTAS
35
30 ANSWER SHEETS
25
20
11 to 13 weeks
15
10
5
0
MAR 00 SEP 00 MAR 01 SEP 01 MAR 02 SEP 02 SEP 02 SEP 02 MAR 03 MAR 03 MAR 03
ALL ALL ALL ALL ALL E-6 E-5 E-4 E-6 E-5 E-4

SEP 01 and Prior - Pre-NEAN MAR 02 – SEP 02 – Continued MAR 03 – Completed


Cycles Initial NEAN Initiatives NEAN Initiatives
(SEP 00 – Standard Score NEAN (First Cycle -Improvements
program change Initiatives for separate E-4/5/6 continue (First Cycle
delayed publication (Raw Score paygrade processing for separate E-4/5/6
SEP 01 – 9/11 Tragedy Processing from admin to Raw Paygrade processing
delayed E-4/5 admin until and exam Score) from admin to
early October; anthrax scare admin Advancement Planning;
affected some mailing) affected Raw Score processing
by Afgan- and E-4 exam admin
istan affected by IRAQI
Invasion) Freedom)

Figure 4. NEAS process from exam administration to publication.

Currently, obstacles to consistency in the accelerated processing are: no single source of


Navy overnight mail, command compliance to new requirements, acts of nature, terrorism,
wartime conflicts, and manning/budget constraints.

Internet Posting of Advancement Examination Results

Another very important accomplishment by NETPDTC is the web posting of advancement


status and statistics. In the former method of advancement publication, results were typically
mailed in paper form, from NETPDTC to all Navy commands. This paper mailing of
examination results could take anywhere from 1 to 4 weeks to reach a command, sometimes
longer if the commands were deployed. Consequently, Sailors may have had to wait as long as
17 weeks after testing to see how they performed. Part of the NEAN initiative was to find a way
to provide the fastest possible examination and advancement feedback on a central website by
utilizing existing Internet technology. This effort consisted of three basic goals. The first goal
was to post advancement examination profile sheets on a web page accessible by every active
and reserve Sailor, at home, at work, ashore, or afloat. The profile sheet provides the exam
takers with a summary of their performance in relation to their peers. The second goal was to
provide commanding officers with necessary administrative reports showing how Sailors under
their charge performed, who was selected for advancement, and who was not. The third goal

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
436

was to provide commanding officers with statistical reports showing how Sailors under their
charge performed compared to all commands in the Navy.
The tasking for these goals consisted of a partnership between Navy military and civilian
personnel, which resulted in several accomplishments. A web page design team was established.
Computer programs were created to extract data from existing databases and compose the
information recognizably on the web page. The website was tested by various afloat platforms
such as aircraft carriers, support ships and aircraft squadrons, as well as shore commands. These
platforms also provided feedback concerning the accessibility and performance of the website.
The results of the project exceeded expectations for customer satisfaction. Exam takers can
now view their exam results 96 hours after initial publication. This savings in time also allows
exam takers additional time to study for the next exam in the event they were not selected for
advancement. Exam takers also can access their information from past cycles. Commanding
officers can access required administrative reports upon initial publication of results, which has
proven to be a valuable work and time saver for personnel managers. Commanding officers can
view how their own commands performed compared to all other commands in the Navy. This
particular element has reduced workload at NETPDTC and is a valuable counseling tool for
commanding officers when assisting Sailors in making career decisions.
Once inside the website, a menu of options is offered. Commands can view several exam
cycle results, statistics from each cycle, and can look at individual profile sheets. The profile
sheet shows the test taker’s standard score, what the average standard score was for candidates
who advanced, topics the individual was tested on, how many questions pertaining to that section
were on the exam, how many questions the individual answered correctly, and the individual’s
percentile in that section in relation to all persons taking that particular exam. Figure 5 shows an
example of a profile sheet.

XXXXXXX XXXXX

Figure 5. Profile sheet.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
437

Due to the limited bandwidth restrictions and satellite connectivity limits for deployed ships,
several changes were made in the development of the web page. Large amounts of data in the
profile sheets had to be placed in smaller files. Also, the number of website pages was reduced
to make extracting data easier. The website is run from Microsoft Internet Explorer, and is
password-protected to ensure security of data.
The statistics option shows how a particular command performed by duty group by rating
and overall paygrade. Refer to figure 6. The total number of test takers is shown with number
and percentage of the different advancement status. The average standard score of exam takers
who were selected for advancement is also shown. These statistics are compared against
Navywide performance. Statistics for advancement are useful for commanding officers in
helping to build study programs and training programs and make career management decisions.

XXXXXXX XXXXX

Figure 6. Statistics web page.

Conclusion

Through the implementation of these and other initiatives, a significant reduction in the
notification process has been accomplished. Prior to the implementation of these initiatives, the
turnaround time for Sailors to receive their advancement results following examination was
between 11 and 13 weeks. Currently, utilizing these initiatives the turnaround time has been
nearly cut in half to about 6 weeks. Improvements in the process will continue as procedures are
refined and as individuals become more adapted to the new regulations.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
438

THE ROLE OF PSYCHOLOGY IN INTERNET SELECTION


E. Edgar, A. Zarola , L. Dukalskis & K. Weston
QinetiQ Ltd
A50 Cody Technology Park, Farnborough, Hampshire GU14 0LX, UK
eedgar@qinetiq.com

This paper reports the second year of a three-year programme of work to examine how the
UK Armed Forces can develop their use of the Internet to best effect in selection. Given the
large numbers of non-graduate recruits employed by the three Services, the research
programme has focused on recruit selection. The first year programme included a review of
Internet selection practices and the development of a prototype recruitment and selection
(R&S) website (Weston, Edgar, Dukalskis & Zarola, 2002). During the second year of the
programme, the researchers focused attention on the potential applicant user group. This
report describes the rationale behind consideration of the applicant user group, the methods
applied, and the implications of findings when it comes to Internet-based R&S systems.

INTRODUCTION
The popularity of e-recruitment has grown substantially over the last five years. According to
Reed (2002) 78% of UK graduate recruiters prefer online applications. This compares with
1998 when only 44% of employers had the facility for online applications (Park, 1999).
However, the sudden growth in popularity creates its own problems. For example, according
to Park (2002) employers find it increasingly difficult making decisions on where to advertise
jobs due to the variety of media available. As technology develops to support a range of R&S
approaches, beyond recruitment activities, other features of traditional selection systems are
also becoming more available online, e.g. personality testing, ability testing and even
interviewing.

Inevitably, the proliferation of e-recruitment and e-selection has raised questions across
professions. Whilst electronic R&S features may look the part, reviews of e-practices are a
little sparse when it comes to evaluating key reliability and validity features. For
psychologists and other professionals, security, authenticity, fairness, privacy and
standardisation have been increasingly recognised as issues of concern. These same issues
appear to mediate the extent to which traditional methods have been adapted to the Internet
forum. Much of the discussion tends to focus on the organisational perspective. For example,
how to manage the increased volume of applications that can be generated from e-application
systems; how to address security issues, such as authenticating the user’s identity and prevent
cheating; and how to monitor and evaluate the effectiveness of on-line systems. Initiatives
tend to focus on addressing employers’ concerns and optimising the use of the Internet for the
employer. In contrast, significantly less attention has been given to another important user
group: the potential applicant group.

Human factors research across various domains illustrates the importance of considering the
user when designing systems and other artefacts (e.g. Norman, 1988; Booth, 1989). Such
research also highlights the costs of not giving sufficient attention to different user groups.
When considering user groups related to Internet-based R&S systems distinctions between
users can be made. Employers who purchase on-line testing facilities from test publishers are
users of the on-line facilities and support. Test publishers are users of the Internet medium
when, for example, they employ the technology to collect data to monitor the performance of

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
439

their tests. However, job seekers or browsers are the users at the interface. They are the group
to which all efforts are targeted to encourage applications from those suitable to fill vacancies.
Thus, it makes sense that attempts are made to understand their perspective when it comes to
designing and applying Internet R&S systems. It is noted that the success of online
recruitment can be quite variable (Park, 2002). By considering the applicant population,
factors may emerge to help explain why consistency varies in the success of the new
technology to support e-recruitment and e-selection. Effective R&S systems (traditional or
electronic) rely on the calibre and number of applications made in the first instance. If e-
systems do not appeal to or attract responses from potential new employees, then, at best, they
become a costly supplement to traditional methods, and, at worst, portray a negative image of
the organisation.

From the limited research that has considered the applicant user in the e-recruitment context,
some interesting findings have emerged. For example, in the UK, Foulis and Bozionelos
(2002) reported on the attitudes of a postgraduate group to Internet recruitment. Their findings
indicate that the perceived advantages of using the Internet for job search activities
outweighed the disadvantages. Key advantages were identified as: the completeness and
quality of the information provided; convenience and accessibility; and the speed of the
process. Some disadvantages included: restriction to certain employers and jobs; technical
difficulties; the Internet afforded a less direct approach; excessive amounts of information;
and length of time employers take to respond via email. Some concerns were expressed about
the opportunity for ‘cheating’, but student confidence in the systems was dependent on the
context. For example, the respondents had few concerns about sending their completed
application forms through the Internet, but they did have reservations about automated
screening. Interestingly, the students indicated that they wanted a choice of traditional and
electronic methods to support their job search activities. The quality of the website was also
found to have an impact on applicants’ perceptions of the organisation.

Also in the UK, a study by Price & Patterson (2003) questioned twenty undergraduates (21-26
years of age) about their e-application experiences and reported that nearly all participants had
a favourable attitude towards using the Internet in general. The researchers considered
accessibility to the Internet and found that the largest proportion of candidates had relatively
easy access to the Internet; being able to access the Internet at home or at the University. The
vast majority, however, opted for the cheaper option of accessing free facilities at the
University. Indeed, the issue of cost was raised by the respondents, as a factor that inhibited
their use of the Internet when they had to pay to use it, e.g. time online to complete
application forms.

Price et al’s interviewees reported a number of reactions that the researchers labelled
psychological processes. These included a concern for privacy and a greater desire for
support with online applications, e.g. feedback in the form of an acknowledgement of
application was considered essential. Finally, applicants felt that using the Internet
dehumanised the application process and also made it easier for individuals to exaggerate
their responses, be more casual and offer socially desirable responses.

Some usability issues were addressed by Price et al and whilst technology has evolved
considerably, users still experience technical problems that can affect their attitude. Some
practical preferences were identified. For example, respondents reported that they thought

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
440

electronic forms were more convenient, but wished that facilities allowed them to preview,
flick through and check their application forms (just as they can with paper application
forms). Design features also left respondents feeling restricted. Some felt that too many drop-
down menus limited choice and the lack of space restricted their ability to give a full picture
of themselves. Unlike paper forms, there was no facility to add an extra sheet when
necessary. Like the participants in the Foulis and Bozionelos study, the students believed that
organisations who provided effective online application facilities created an innovative,
forward-thinking image and appeared more appealing as an employer. Conversely, when the
students had a negative experience using online application systems, their image of the
organisation was affected negatively.

Equity of access to the Internet has been an issue of concern since the introduction of the
Internet as a medium for recruitment and selection. Price et al echo concerns of others by
suggesting that use of the Internet is limited on the basis of demographics such as age, sex,
race and income and, therefore, the use of the Internet may introduce adverse impact. It is
important to investigate the possibility that the use of the Internet medium might serve to
exclude candidates on the basis of demographic factors. However, it is possible that the
characteristics of each applicant population will determine whether access and ability to use
the Internet is an issue for concern in each specific case.

Beyond access to Internet-based R&S facilities, there is another important issue to consider:
the extent to which the target population is motivated to use career-related websites and the
specific R&S features they accommodate.

Applicant groups vary in their characteristics just as the jobs and roles for which they apply
vary. This is why it is important for organisations to consider the specific groups of people
they want to target when they design their Internet applications. Many of the R&S features
found on the Internet, particularly selection applications, are targeted at graduates. Studies,
such as that documented by Foulis and Bozionelos, can help direct initiatives aimed at
targeting a similar graduate applicant population. However, whilst some Internet
applications, such as Job Boards, accommodate non-graduate populations, little research
attention has been paid to understanding the large non-graduate population of job seekers and
their perceptions of the Internet as a medium used in the R&S process.

The QinetiQ researchers participated in a programme of work aimed at fulfilling a number of


objectives. These can be summarised within the following three questions:

a. Is the recruit target population limited from accessing the Internet to carry out job search
activities?
b. Do the target population use the Internet to carry out job search activities?
c. How is the prototype Tri-Service web site received by the target group?

The first question addresses the very practical issue of whether potential applicants have
access to the facilities they need to take advantage of electronic recruitment and selection
facilities. The second and third questions, although addressing different elements, both relate
to the motivation to use these electronic facilities. By asking about their current interest in
using the Internet for job search activities, the researchers aimed to identify the extent to
which this target group currently use the Internet for R&S activities, and what preferences

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
441

they may already have in relation to the different types of R&S features available. In eliciting
views about the Tri-Service website, the researchers also wanted to elicit attitudes towards the
traditional and more novel R&S features that could be accommodated on the Internet.
However, a large element of this latter research activity was aimed at canvassing opinion
towards specific interface design elements, in order that, for example, the physical design of
the site could be improved with respect to ease of use, preferences for colours, removal of
constraints etc.

The methodology used to address these objectives is described, and selected results follow. It
is beyond the remit of this paper to give a full account of results specific to the interface
design. Rather, results are restricted to applicant access and their interest and willingness to
use different e-selection features.

METHODOLOGY
Three research activities were designed to address the research issues.

Internet and Careers Survey (ICS): A survey design was used whereby a questionnaire was
developed and administered to students at secondary schools around the UK. The survey was
designed in consideration of available literature and recent public statistics about Internet
access. In addition to access, the questionnaire also included items related to frequency of
Internet use, preferred media, and career search activities. A team of researchers visited the
schools to administer the questionnaires to classes of students aged between 14 and 18 years.
Five schools participated in the study, and completed questionnaires were obtained from 170
students.

Web site usability trials (UTs) – structured walkthroughs: A semi-structured evaluation


protocol was designed in consideration of usability guidelines and a working definition of
usability (Booth, 1989). Booth separates concepts of usability into four key areas:
1. Effectiveness (or Ease of Use): This concerns how easy the site is to use. In simplistic
terms, it relates to the ease with which the system can be used e.g. to get from A to B, and the
effectiveness of the design to allow people to use it.
2. Learnability: This relates to the degree to which the design used e.g. on one website
supports the use of knowledge acquired from previous exposure to other systems. For
example, users should be able to transfer their knowledge of other sites to learn how to use a
new site quickly. Additionally, the site should be flexible enough to support previous
experience, e.g. such that if someone wants to use keyboard shortcuts, this should be
supported.
3. Attitude (Likeability): This covers a broader spectrum. It relates to issues of personal taste
e.g. colour, factors which increase frustration, anxiety, or pleasure and any other subjective
feeling resulting from interaction with the system.
4. Usefulness (or Functionality): This relates to the degree to which the user can and wants to
use the system (or its applications) to achieve their goals. That is, it is not about how easy the
system is to use, it is about the worth of what the web site offers.

Four features of the web site were explored in detail. These included: an application form; a
biodata questionnaire; a medical questionnaire; and an eligibility form. Students were
randomly allocated to one of the four features.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
442

Twenty-one students from the schools involved in the Internet and Careers Survey
participated. Individually, students sat next to a member of the research team, who acted as a
facilitator, and were given an explanation of the study. It was emphasised that they were not
being evaluated, and that any comments they could offer would be valued; positive and
negative. Each student was presented with the Home Page of the web site and was asked to
complete a task. For example, There is an application form on the web site. Try to find it,
complete it and then return to the Home Page. Each task involved negotiating the web site to
find the relevant location, following instructions on the page and returning back to the Home
Page. Students were encouraged to comment freely as they progressed. To supplement free
responses, a series of open-ended questions were put to the students. The free response
questions were categorised under the following headings: The look of the web site; Ease of
use; Response and feedback; Experiences; Functions.

To supplement limited responses made by students structured questions were also used in an
effort to obtain a full picture of the students’ experiences. The facilitator recorded the
responses made by each student. Following completion of each task, students were given the
opportunity to ask any questions and comment on other features of the site they might have
encountered outside the parameters of the task. Each session lasted approximately 30
minutes. The responses were analysed using qualitative and quantitative techniques.
Comments were coded by three researchers according to task scenario, and in relation to the
usability elements to which they referred. Structured questions that involved responding on a
rating scale were analysed using SPSS for Windows v10.

Job Search Preferences Questionnaire (JSPQ): Following the initial survey and usability
trials, an opportunity arose to question a further student sample. A short self-completion
questionnaire was designed to be administered at a school careers fair. The questionnaire
included items that could be used to help clarify student preferences for job search and job
application activities. The questionnaire allowed researchers to consider further reactions to
some of the recruitment and selection functions that could be supported using Internet
technology. A total of 30 students completed this questionnaire.

RESULTS
A total of 170 students responded to the Internet and Careers Survey (ICS) from 5 schools
around the country. (57% from the North of the country and 43% from the South). Fifty-three
per cent were female and 42% were male (5% missing data). Twenty-one students, from the
same five schools, completed the usability trials (62% male and 38% female). Students’ ages
ranged from 14 – 18 years, with the majority aged between 14 and 15 years of age. Thirty
students completed the JSPQ (77% male and 23% female) between the ages of 13 and 15
years.

Access
All the schools provided student access to the Internet, and only two students out of 170
(1.2%) stated that they had never used the Internet. A significant majority of students access
the Internet at least once a week, with 58% reporting they access the Internet at least 2 –3
times per week (at school or at home). The most unpopular locations for accessing the
Internet are a) at an Internet café (10%), b) at a career’s centre (18%) or c) at a library (48%).
The results indicate that for the whole sample, the most common method of accessing the
Internet is by using a computer (PC/Mac) and that many students access the Internet in this

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
443

way both in and out of school. A significant minority of students use technology other than a
computer, e.g. WAP/ Mobile phone (27%), television (TV) (17%); or games console (17%) to
access the Internet.

Access to the Internet does not appear to be a significant problem for this age group.
Nevertheless, it may be the case that potential recruit applicants who have left school and who
have not gone on to a higher educational establishment may not have the same free access to
the Internet. Alternatively, the proliferation in the use of alternative technology to access the
Internet indicates that, for some groups at least, fairness concerns relating to Internet access
might be diminishing.

Job Search Activities


Forty-one per cent of all respondents stated that they had used the Internet to search for job or
college opportunities. Most of this group (86%) stated that they accessed the Internet for such
information either once a month or less than once a month. Forty-three per cent of
respondents said that they had used the Internet to search for college courses; 27% stated that
they had searched for a job; 19% had searched for information about an organisation; 9% had
registered on a career or job search web site; and 2% had submitted a CV online. Whilst
interest was expressed in using the Internet for job search activities, experience of doing so
appears limited for this particular age group. Age and expectations about when to search for a
job, as well as restricted exposure to employment processes may have limited responses to job
search questions.

Motivation to use e-selection features


Website Functionality: UT participants were asked to offer their opinions about the different
features included on the site. They were asked to rate the usefulness of the tool that formed
the basis of the task, and how much they liked the specific example with which they
interacted. The students thought most of the R&S features were very useful. Not all had the
same strength of feeling for the medical questionnaire or application form, nevertheless, most
thought it useful to some degree. The exception was found for Biodata with one participant
stating that they thought this was of little use. In rating how much they liked the specific
feature shown on the website, most tended to like the features they had used, but did not rate
them as highly as they had rated the associated concept. The medical questionnaire stood out
with half of the students reporting that they did not like it. (Evidence elicited elsewhere with
respect to specific design issues helped to inform why this might be the case).

Application Preferences: When given a choice between using the Internet or writing and
posting application forms, UT participants indicated that their preferred method of completing
and returning an application is to use Internet facilities for all actions, and their least favourite
preference was a hybrid option of downloading a paper version of the test and then posting it
back. Additionally, JSPQ respondents showed a preference for using the phone to make
contact with organisations in the application stage and request application forms. When it
comes to completing and returning application forms, students appear to want a choice of
traditional and new methods, and prefer an element of consistency in the genre of methods
chosen.
Initial selection preferences: JSPQ participants appear to have varied preferences when it
comes to initial selection features, such as biodata forms, essay, ability tests, personality tests,
tests of knowledge; and personal skills. For most features, responses indicated that having a

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
444

choice of media is important. The responses show that some traditional methods remain
popular, but that the choice of method differs depending on the activity. For example, it
would appear that for ability testing more students want to complete the tests at the
organisation (e.g. under traditional conditions at the HR department) compared with
personality testing, which they appear happy to do at home or via the Internet. Such
preferences may be related to different levels of confidence in the technology to support fair
or standardised assessment under the different conditions. For example, factors such as
standardisation and security, may be perceived to be more crucial for ability testing compared
with personality testing, but less reliable under Internet conditions. Some comments elicited
during the research referred to concerns about the possibility of others ‘cheating’ on the
Internet and accidental deletion of information.

Other selection preferences: When it comes to activities that tend to be administered later
within a selection process, e.g. Interviews; presentation; activities with others, the most
popular means of participation is on location at the organisation. Email was not popular (or
feasible) at this stage, but a significant minority stated they would be interested in using
Internet-supported groupware for presentations (33%) or activities with others (27%). For
interviews, 23% stated that they would be interested in using web-cam technology. Whilst
this figure is not great, it indicates that one fifth of participants are willing to consider
interviews supported by Internet technology.

This picture was also represented by usability trial participants. Their stated preferences for
Internet use in the selection process tended to be for obtaining information, and for the
process of application. They offered some support for using the Internet for testing and
communication activities, however less students supported the idea of using the Internet to
facilitate selection interviews.

Again, it would appear that preferred choice of media depends on the selection activity.
However, it also appears that some individuals may have a preference for modern technology,
whilst others tend towards more traditional approaches.

Impact of website on organisational perceptions:


To investigate if there was any affect of experience with the website and attitude towards
employment with the ‘host’ organisation, the researchers included a final question at the end
of the usability trial. This question was ‘would you consider a career with the Armed
Forces?’. Responses were compared with an item that had been posed at the beginning of the
usability trial, ‘Have you ever considered a career with the Armed Forces?’ There was no
significant increase in the number of students who responded ‘yes’, they would consider a
career with the Armed Forces (27% to 33%), but there was a decrease in those who gave a
categorical ‘no’ (71% to 38%), with more respondents expressing uncertainty (0% to 14%).
Whilst not conclusive, the results appear to support the notion that exposure to a satisfactory
organisational interface and a positive experience could create a more favourable view of an
organisation.

Self-disclosure and self-assessment


The degree to which potential applicants were willing to disclose personal information
through the Internet, and to what extent students would like to know more about their
suitability for a job prior to committing to an application, was investigated.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
445

The students were asked about their attitudes towards giving detailed personal information
about themselves at an early stage in the application process. Forty-seven per cent stated that
they would feel comfortable giving detailed, personal information at an early stage, 23% said
that they would not feel comfortable and 30% did not respond to the question. The phone was
chosen as the most popular means of giving detailed personal information to the organisation.
This was followed by email and using a website directly. Interestingly, the least popular
methods involved posting hand or type-written materials. Face-to-face was not given as an
option, and it may be the case that some students would always prefer to give detailed
information about themselves on a face-to-face basis. However, at an early stage in the
selection process it is unlikely that employers would have the resources to offer face-to face
opportunities, hence the omission from the list of options. Nevertheless, the absence of this
option should be acknowledged.

Using the JSPQ, the majority of respondents (67%) expressed interest in being able to find out
more about their suitability to a given job prior to making an application. The following
features were presented as options to aid this self-assessment: detailed job description;
written, frank comments by employees; job knowledge quiz; job interest inventory;
personalised careers advice; personality testing and ability testing. Between 77% and 97% of
students chose all of the features available to find out more about their suitability for a given
role. The preferred mode to help self-determine suitability for role was investigated. Once
more, results show that whilst the majority of students appear willing to use the Internet, a
high minority want the choice of using traditional methods in order to find out more about
their suitability for a job role.

DISCUSSION AND CONCLUSIONS


In response to one of the main research questions, it seems that the target population is not
limited with respect to access to Internet selection facilities. This has important implications
when it comes to issues of fairness, specifically in relation to adverse impact. Price et al
(2003) might be correct in their suggestion that Internet access may be affected by
demographic determinants, but this may not apply to all applicant groups. However, whilst
the age range of the sample involved in this study can access the Internet for no personal cost
at school, it remains to be seen if the older recruit applicant would be deterred by having to
pay for Internet selection services. This has implications for how the military organisations
can encourage or support Internet access, for example through provision of free Internet
access at Armed Forces Careers Offices.

With respect to the extent to which the target group carry out job search activities, given the
frequency with which they use the Internet, results indicated that the majority of participants
did not use the Internet for such activities, and those that had, did so infrequently. This result
and other signs throughout the research, e.g. indication that some students were unaware of
key activities carried out through traditional selection systems, suggest that this target group is
relatively inexperienced when it comes to recruitment and selection processes. This is likely
to have affected some of their responses, but, equally important, it has implications for design
of e-selection systems aimed at a similar age/ experience group. Explanations of what is
expected and descriptions of why certain information is important may need to be much more
explicit for certain target groups. Similarly, because this target group does not appear to have
‘favourite’ job sites, or job search experience, compared with graduate counterparts,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
446

employers’ may need to be inventive about where they place advertisements, e.g. on mobile
phone alerts; sports websites, school events etc. They may need to ‘attract’ the applicants they
seek to their selection web sites, rather than being able to assume that good applicants will
find their own way.

The extent to which this target group used technologies other than desktop computers etc. was
of interest. It alerts us to the fact that employers need to consider the way they present their
information and questions, as the visible space on a desk top tends to be much greater than on
a phone. These differences in media choice seem to add support for a move to adaptable
interfaces, whereby the interface changes according to responses to a set of questions relating
to media being used, demographic determinants, qualifications or, simply, personal
preferences.

In relation to motivational issues, and in support of previous studies (Foulis et al, 2002; Price
et al, 2003), participants seemed motivated to use the Internet for the purposes of at least
some job search activities. Interestingly, once more, their concerns reflected some of those
expressed by employers. That is, the students raised problems relating to standardisation and
the ability of other applicants to ‘cheat’. This suggests that employers need to offer some
assurances about their procedures; that they are fair and they understand applicant concerns.

To enhance motivation, it appears that employers need to retain elements of choice when it
comes to modern technology and traditional approaches. A clear theme running throughout
the various sets of results is that applicants want a choice of how to respond to various
selection activities. These choices appear to depend on a preference for new technology over
traditional methods, but also vary according to what they are asked to do. For example, the
desire to go to the organisation for ability testing but a preference for Internet facilities when
completing application forms. These preferences contrast with the preferences of Foulis et
al’s students who stated they were happy to complete tests online. The implications of these
findings are that employers should not be too hasty to eradicate traditional approaches.
Indeed, Freeserve, the Internet company which adopted e-recruitment techniques, continues to
use paper-based advertising to complement web applications (The Sunday Times, 2003, July
13).

Comparison across the graduate groups used in previous studies and the non-graduate groups
in this study illustrate differences in the range of issues that are volunteered, preferences and
experience. Such differences illustrate the worth in considering different applicant groups
separately. The researchers expect the design of the prototype website to be enhanced by
having elicited views from the target users directly. For example, the research has elicited
positive feedback about the utility of some of the features, and constructive feedback about
how the specific examples of these features on the prototype are limited.

It is argued that not only is it egalitarian to consider the applicant alongside the employer, but
that it also makes pragmatic sense if the technology is going to support, rather than lead,
future recruitment and selection systems. The work psychologist would appear one of the
professions suitably positioned to help facilitate understanding and draw implications in this
area.

ACKNOWLEDGEMENTS

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
447

This work was funded by the Human Sciences Domain of the UK Ministry of Defence
Scientific Research Programme.

REFERENCES

Booth, P. (1989) An introduction to human-computer interaction. Hove: Lawrence Erlbaum

Foulis, S. & Bozionelos, N. (2002). The use of the internet in graduate recruitment: The
perspective of the graduates. Selection & Development Review. Vol. 18, No.4, p12-15.

Park (2002) Graduate in the eyes of the employers, 2002. London: Park HR and The
Guardian.

Park (1999) Graduate in the eyes of the employers, 1999. London: Park HR and The
Guardian.

Price, R. E. & Patterson, F. (2003) Online application forms: Psychological impact on


applicants and implications for recruiters. Selection & Development Review. Vol. 19, No. 2,
p12-19

Norman, D. A. (1988) The design of everyday things. London: MIT Press

Reed Executive PLC (2002) In R. E. Price & F. Patterson, F. (2003) Online application forms:
Psychological impact on applicants and implications for recruiters. Selection & Development
Review Vol. 19, No. 2, p12-19.

The Sunday Times. July 13, 2003. Appointments, p7.

Weston, K. J., Edgar, E., Dukalskis, L. & Zarola, A. (2002) Internet-supported Tri-Service
recruit selection: An exciting future. QinetiQ Report: QINETIQ/CHS/CAP/CR020199/1.0

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
448

WORKING FOR THE UNITED STATES INTELLIGENCE COMMUNITY:


DEVELOPING WWW.INTELLIGENCE.GOV
Brian J. O’Connell, Ph.D.

Principal Research Scientist


American Institutes for Research
1000 Thomas Jefferson Street, NW
Washington, DC 20007-3835
boconnell@air.org

Cheryl Hendrickson Caster, Ph.D.

Senior Research Scientist


American Institutes for Research
1000 Thomas Jefferson Street, NW
Washington, DC 20007-3835
chendrickson@air.org

Nancy Marsh-Ayers

Intelligence Community Program Manager IC/CIO

INTRODUCTION
One of the most significant problems that the Intelligence Community (IC) faces in
recruiting applicants is the stereotypes that exist concerning these agencies and the work
performed by Agency employees. In particular, these stereotypes focus public attention on a
very narrow portion of the actual (or fictional) job opportunities within the IC and also give a
very glamorized and inaccurate view of actual work and working conditions. Literally, media
portrayals are a very poor “Realistic Job Preview”. The Director Of Central Intelligence, Chief
Information Office mandated the creation of a web portal that would provide accurate
information about all job related matters for members of the intelligence community.

The Chief Information Office (CIO) of the Central Intelligence Agency (CIA) contracted
with the American Institutes for Research (AIR) to assist in the development of an IC-wide web
site. This web site represents 15 different agencies that are, wholly or in part, involved with
intelligence work. One section of the website, A Place For You, includes information about IC
careers and occupations. The goal of this section is to assist web site visitors in career planning
activities, such as determining which occupations are most relevant to their background (e.g.,
education, interests).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
449

AIR’s past experienced in the IC enabled us to support this requirement based on


previous occupational analysis (and other work) at three of the major agencies: the National
Security Agency (NSA), the Defense Intelligence Agency, and the National Mapping and
Imagery Agency (NIMA).

Background
In the mid-1990’s, IC member agencies were responding to radical changes in US threats
and, as a result, Agency missions began to change. For example, the primary threat of fighting
conventional wars against the former Soviet bloc vanished. The necessity of fighting low
intensity conflicts became apparent. Both of these factors led to a questioning of whether the
IC had the necessary blend of skills and abilities to carry out new missions. An additional
challenge was the significant societal pressure to continuously improve efficiency and quality
(for example, see General Accounting Office, 1994; GAO/Comptroller General of the United
States, 1996; and the work of the National Performance Review35).

The then current approach to position management and recruitment in the IC was very
much driven by the legacy systems and missions. Changing from these systems was difficult and
faced significant organizational barriers. AIR implemented a pilot project at NSA to evaluate the
possibility of transitioning one of their core business units to a more flexible process of
describing their work and the skills necessary to achieve mission goals.
AIR’s approach was based on the Occupational Information Network (O*NET). This
system was based on the O*NET Occupational Information Network (Peterson, Mumford,
Borman, Jeanneret, & Fleishman, 1999). The O*NET model evolved from a thorough review of
previous work-related taxonomies, and represents state-of-the-art thinking about the world of
work. Unlike many other models, it conceptualizes both the general and the specific aspects of
the work domain in an integrated fashion and was thus ideally suited to NSA’s needs. For
example, O*NET’s Basic and Cross-Functional Skill (BCFS) taxonomy was used to capture
broad competencies that cut across NSA jobs, so that these jobs could be compared and grouped
to create a skills-based occupational structure. Similarly, O*NET’s Generalized Work Activities
(GWA) taxonomy was used to ensure homogeneity of work activities within each occupation.
Finally, O*NET’s Occupational Skills taxonomy was used to characterize the specific
requirements of particular NSA occupations.
The O*NET approach was well received at NSA and the pilot program was expanded to
include the entire agency. That is, AIR had the task of describing all work carried out at the
agency in a skills-based taxonomy. This work, and its follow on activities for the agency,
continues today.
Other members of Community were impressed with the impact and flexibility of the
occupational redesign at NSA. In 1998 NIMA adopted this approach This approach was

35
Now the National Partnership for Reinventing Government. See
http://govinfo.library.unt.edu/npr/library/review.html and http://www.nima.mil/ast/fm/acq/nima_commission.pdf

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
450

particularly well suited for NIMA. Consolidating employees from several Federal agencies
formed NIMA in 1996. These agencies included: the Defense Mapping Agency (DMA), the
Central Imagery Office (CIO), the Defense Dissemination Program Office (DDPO), and the
National Photographic Interpretation Center (NPIC), as well as the imagery exploitation and
dissemination elements of DIA, the National Reconnaissance Office (NRO), the Defense
Airborne Reconnaissance Office (DARO), and the Central Intelligence Agency (CIA).
However, there was also a disparate set of approaches to HR issues in the amalgamation of these
agencies. O*NET was a logical solution to providing a new flexible and consistent approach to
base HR practices.
In1999 DIA awarded AIR a contract to transition their legacy position based HR system
to a skills based one that was rapidly becoming the benchmark for the IC. Over the next two
years, AIR research scientists transitioned over 1000 job titles to 21 Occupational Groups.
The occupational analyses at these agencies provided AIR research scientists with several
elements that would facilitate the development of a website describing work in the IC. These
included (a) a consistent taxonomy for describing work throughout the agencies, (b) agencies
that would provide the “lions share” of unique jobs in the IC, and (c) broad based experience of
work carried out at these three large IC agencies. These factors laid the groundwork for the
identification of occupational similarities across the IC and ultimately the web portal content for
the IC website www.intelligence.gov.
This IC-wide web site represents 15 different agencies of the IC. Some of these agencies
are exclusively designed to support the intelligence mission (e.g., NSA, CIA) while others (e.g.,
Department of State and Department of Energy) have hybrid missions where only part of the
agency’s overall mission is involved in intelligence related work.
One of the goals of the web site is to educate the public on the types of work and
missions carried out by each member agency and to provide a Community perspective. In
addition, the A Place For You section of the web site assists web site visitors with identifying the
most relevant careers and occupations based on their background (e.g., education, interests).
This section also provides information about where individuals can contact member agencies to
apply for advertised positions. In essence, this site is to serve as a one-stop shopping location for
members of the public who have an interest in working within the Community.

Challenges

In the past decade the US economy has been growing at a record pace. This has meant
that there is stiff competition for employees in all sectors of the economy. The IC must compete
with the private sector to hire qualified individuals for a wide variety of positions. Unlike the
private sector the IC faces unique challenges when recruiting, none the least of which are the
media generated stereotypes about day-to-day work for these agencies. Unfortunately, these
stereotypes can lead to unrealistic work expectations of job applicants and even subsequent
dissatisfaction with work and ultimately personnel turnover. Given the cost to recruit, train and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
451

obtain the necessary clearances necessary to be able to work in the IC the community must do
everything possibly to ensure applicants understand the work they are undertaking. In a
technical sense media portrayals are a very poor “Realistic Job Preview” for any applicant
considering working in the IC. Further they tend to focus on a fairly narrow range of jobs that
can also hinder recruiting.
Additionally, different agencies have very different degrees of public awareness about
their contribution to “intelligence” work. While agencies like CIA and NSA have very high
public visibility, organizations like ONI or the Department of Energy (DOE) have a much lower
profile. The www.intelligence.gov site puts a frame of reference around the IC, identifies all the
components and their role in the overall IC. The front page of our web site is shown below in
Figure 1.

Figure 1. Home page of www.intelligence.gov

The development of an unclassified web site describing work in the IC presented several
unique technical challenges. The primary challenge was identifying a common metric for
describing jobs within the IC. The next major challenge was the security classification level.
Obviously, some of the jobs are classified. In addition, an unclassified job in one agency can
(and often is) classified in another agency. Therefore, extreme care was taken in making agency
attributions about certain jobs or occupations. The next major challenge was to get agency “buy-

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
452

in” to have their data represented in a public forum such as this web site (remember until only
relatively recently the very identity of some of these agencies was classified, e.g., NRO).

How Common Jobs Were Identified

AIR’s experience conducting occupational analyses within three Community agencies


(i.e., NSA, DIA, and NIMA) was leveraged during the development of The Place For You
section of the website. The same methodology for describing work (i.e., O*NET) was used in
these agencies. This is a critical and key advantage in developing the current IC website as AIR’s
work employing the O*NET taxonomy has provided a common metric, or descriptor set, for
describing all work in these agencies.
Based on this data, and our understanding of the work involved within these IC agencies,
AIR staff proposed to the government initial decisions about the commonalities among the
occupational structures within these three agencies. Our decisions were made at two levels of
detail. First, does this career exist in the agency? Second, does a specific occupation exist within
a given career? For example, a career might be “Intelligence Analysis”, with an associated
occupation of “Signals Intelligence Analyst”. These three agencies formed a baseline36 from
which to work off as the job analysis information was developed in a comprehensive manner
over a number of years. We had complete confidence in the data from these agencies.
The next step was initially more inferential. Once our baseline had been established, a
two-step process was formulated. First, informed inferences were made about likely occupations
in the other IC agencies by experienced AIR staff members. These inferences were based on
many years working in the IC community on job analysis and other projects. This was also a
very conservative assessment of the careers and occupations that were thought to exist in other
agencies. This inferential process was also aided by meeting with agency representatives and an
examination of the Community agencies’ web sites. Next, these inferences were “validated”
through a survey of Community representatives (often Human Resources representatives).
Specifically, representatives reviewed the occupational and career information on the web site
and determined if the information was relevant to their specific agency. AIR research scientists
assisted the representatives during the survey process by responding to questions and clarifying
information, as needed. AIR is currently receiving updated surveys from the IC members. In
addition, new IC members such as the Department of Homeland Security (DHS) are being
surveyed. An essential element of the survey process is the representatives’ approval of the
career and occupation descriptions on the web site.

Next Steps

36
This baseline was from three very large agencies consisting of approximately 30,000 personnel worldwide.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
453

As we move forward with integrating the information from the Community, we are
adding functionality to the web site to improve the utility to the general public. Some of the
planned website enhancements include the following:
• Enhanced descriptions of core business functions on the website (e.g., analysis
and collection);
• An Interest-Occupation cross referencing guide;
• Identification of “Hot Jobs” within the Community; and
• Enhanced graphic design and marketing of the site.

CONCLUSION
In this paper we outlined the background and rationale for the development of the IC
website www.intelligence.gov. This site was developed with a baseline set of data from
three large IC members and is being updated as new members are established and new
occupations are identified in current organizations. In addition, the functionality of the
website is being enhanced to provide tools and information that will improve the quality
and quantity of employment related information about the IC to members of the general
public.

In the future that public will be able to cross reference their academic backgrounds and
other interests with occupations and careers that exist in the IC.

This site provide a one stop information source about all members of the IC and
provides members of the general public carefully developed information that
accurately portrays the work of the professionals that comprise the United States
Intelligence Community.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
454

REFERENCES
General Accounting Office (1994). Improving mission performance through strategic
information management and technology: Learning from leading organizations
(GAO/AIMD-94-115). Washington DC: Author.
General Accounting Office (GAO)/Comptroller General of the United States (1996). Effectively
implementing the Government Performance and Results Act (GAO/GGD-96-118).
Washington DC: Author.
Peterson, N. G., Mumford, M. D., Borman, W. C., Jeanneret, P. R., & Fleishman, E. A. (1999).
An occupational information system for the 21st century: The development of O*Net.
Washington DC: American Psychological Association.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
455

ENCAPS – Using Non-Cognitive Measures for Navy Selection and


Classification

William L. Farmer, Ph. D.


Ronald M. Bearden, M.S.
Navy Personnel Research, Studies, & Technology Department (PERS-1)
Navy Personnel Command
Millington, TN 38055-1300

Walter C. Borman, Ph.D.


Jerry W. Hedge, Ph.D.
Janis S. Houston, M.A.
Kerri L. Ferstl, Ph.D.
Robert J. Schneider, Ph.D.
Personnel Decisions Research Institute
Minneapolis, MN 55414

As we enter the 21st century, the military is in the middle of a major transition.
Due to changes in mission and technology, the organizational and occupational
characteristics that have come to define the present day military are being overhauled.
As a result of this process it is imperative to develop a full understanding of the role that
enlisted personnel play in the “new military.” This role includes interface with systems,
equipment, and personnel that may appear quite a bit different than in the past. What
individual requirements will these players need to accomplish their mission? How will
this translate to personal readiness? How will performance be defined and measured?

In addition to individual requirements for successful performance, a number of


other factors play important roles in the success of selection and classification in the
military. The military uses results from large-scale aptitude testing as its primary basis
for making selections into the service. Following this initial selection, testing results are
further utilized in making classification decisions. The mechanism for making a
personnel classification is a somewhat complicated process that involves using a
combination of individual ability, obtained from the aforementioned testing program, the
needs of the Navy (regarding jobs that need to be filled and gender and minority quota
requirements), and individual interest. Historically, interest has received the least amount
of weight.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
456

Following classification into a job, an individual will go through basic training,


then proceed to the training school pipeline that has been proscribed for the particular
assigned career path. After finishing the initial training pipeline, an individual will be put
on the job, complete the first term of enlistment, then reenlist or not. A number of other
factors, in addition to the things an individual brings into the service, play a crucial role
in how that individual perceives their military career and whether the choice to reenlist or
not is made. Organizational variables have typically received little or no attention in the
military services when evaluating reasons for attrition or retention.

Historically, the preponderance of military predictive validation work has


centered on measuring success in basic and technical training. Job performance in the
first-term of enlistment has been included as a criterion measure sporadically. However,
because finding and training a 21st Century sailor will be much more complex and costly
than it is today, success on the job beyond the first term of enlistment in the Navy will be
increasingly important. The prediction of such long-term behavior as reenlistment and
promotion rates will require the use of new sets of predictor variables such as measures
of personality, motivation, and interest. To effectively use the variables to predict long-
term performance, it will be crucial to better understand the work context for the future
Navy, including the environmental, social, and group structural characteristics.
Ultimately, combining the personal and organizational characteristics should lead to
improved personnel selection models that go beyond the usual vocational and aptitude
relations, encouraging a closer look at theories of person-organization (P-O) fit (see
Borman, Hanson, and Hedge, 1997).

Advances in the last decade or so have shown that we cab reliably measure
personality, motivational, and interest facets of human behavior and that under certain
conditions these can add substantially to our ability to predict attrition, retention, and
school and job performance. The reemergence of personality and related volitional
constructs as predictors is a positive sign, in that this trend should result in a more
complete mapping of the KSAO requirements for jobs and organizations, beyond general

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
457

cognitive ability. One particularly promising approach to measuring individual


differences in the interpersonal and personality areas is the situational judgment test
(SJT). These tests are based in the premise that there are important and often subtle
differences between the behavior of effective and ineffective persons as they respond to
problems or dilemmas confronted in the course of carrying out their job responsibilities
and that such differences are reflected in their responses to similar situations presented in
written form.

Research has demonstrated that short-term, technical performance criteria,


particularly overall school grades, are best predicted by general intelligence while longer
term, more differentiated criteria such as non-technical job performance criteria,
retention, and promotion rates are better predicted by other measures, including
personality, interest, and motivation instruments. In order to select and retain the best
possible applicants, it would seem critical to understand, develop, and evaluate multiple
measures of short- and long-term performance, as well as other indicators of
organizational effectiveness such as attrition/retention.

In general, then, when one considers what attributes are most relevant to perform
effectively in any given job, there are many from which to choose. The type of person
characteristic viewed as important to success in a job may vary from situation to
situation. For example, for a job or set of jobs, one may be most interested in choosing
persons that have high cognitive ability, and care much less about their personality or
interest patterns. In other situations the reverse may be true. For optimal assignment, it
is necessary to somehow link the attributes to how necessary they are for effective
performance in specific jobs or job types, and as attempts are made to expand the
predictor and criterion space, it will be important to extend one’s perspective to broader
implementation issues that involve thinking about classification and person-organization
(P-O) fit. As organizational flexibility in effectively utilizing employees increasingly
becomes an issue (e.g., workers are more often moved from job to job in the
organization), the P-O model may be more relevant in comparison with the traditional
person-job match approach.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
458

Current Research Program

The individual requirements for successful performance will require a thorough


understanding of predictors of success and their relationship with key criteria. This effort
will lead to the development of new measures of aptitude, personality, and other
cognitive and non-cognitive instruments. Prior to this; however, it is necessary to develop
a nomological net between the critical constructs that will define successful performance,
and current selection instruments. The knowledge gained from this effort would provide
a foundation for future developmental endeavors and contribute a much-needed
component to the scientific literature.

With this said, the focus of efforts for the first year of the current research
program (FY-2002), were a detailed illumination of the current state-of-the-science
relevant to the changing nature of jobs in the Navy, and how this will impact selection
and classification. A major component of this effort will be a thorough literature review
of predictors and criterion and the relationships between them. This will include (but in
no way is limited to): a.) extending work that was accomplished as part of the Army’s
Project A, b.) review of current models of job performance, c) review of the literature on
cognitive and non-cognitive predictor measures, d) investigation of promising areas (e.g.
the role of situational judgment) for increasing predictive ability and objectifying
measurement, e) the role of organizational and attitudinal variables, and f) person-
organization and person-job match.

The focus of the past year (FY-2003) has been the development of a
comprehensive computer administered non-cognitive personality-based assessment tool.
This tool, the Enlisted Navy Computer Adaptive Personality Scales or ENCAPS,
promises to improve the predictive ability of the current ASBAB-based testing program.
The addition of a personality assessment tool will aid greatly in the selection and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
459

classification of enlisted personnel, as we continue to redefine individual performance


and expand the base or organizationally important criteria.

The ENCAPS constructs selected for initial measurement are intended to


represent the most important traits from a broader taxonomy of personality constructs
that will ultimately be measured. We used the following inclusion criteria to identify
traits in that broader taxonomy:

1. Unidimensionality. ENCAPS are based on an IRT measurement


model. Though recent work in multidimensional modeling has been
promising, it is generally advisable that constructs be measured
unidimensionally. Item covariance is a requirement for every basic
personality trait. As such, this is not a very restrictive criterion.
However, this requirement does preclude measurement of
compound traits , such as integrity and service orientation, that may
be comprised of non-covarying personality facet variables linked to
important criteria.

2. Temporal stability. The personality traits to be measured by ENCAPS


will be used to select and/or classify naval enlisted personnel into
positions they will occupy over a significant period of time. It is
therefore important that people’s rank ordering on such traits be
preserved over time.

3. Appropriate level of specificity. As with cognitive ability, personality


traits can be represented hierarchically. The appropriate level to
measure in a personality hierarchy has generated significant debate.
For our purposes, however, the key is to measure personality traits
at a level broad enough to provide efficient measurement, but
narrow enough not to (1) obscure meaningful distinctions; or (2)
preclude measurement of specific variance that would increment the
validity associated with the common variance measured by the
broader trait. Traits included in our personality taxonomy will be
selected to optimize this tradeoff.

4. Prediction of important job performance criteria. There must be a


rational or empirical basis for believing that a personality construct
will be predictive of one or more important job performance
dimensions in at least some Navy enlisted ratings. Further, the traits
in the overall personality taxonomy operationalized by the ENCAPS
must, collectively, account for the majority of non-cognitive variance
in job performance dimensions across all Navy enlisted ratings.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
460

5. Well understood and history of successful measurement. Although we


don’t necessarily wish to exclude experimental personality
constructs from our broader taxonomy, the majority of the
constructs should be represented in most of the major
comprehensive personality taxonomies and have been successfully
measured by the instruments operationalizing those taxonomies.

For the initial prototype, three constructs (Achievement Motivation, Social

Orientation, Stress Tolerance) were chosen. Approximately 300 items (evenly divided

between the three constructs), were written to be presented in a paired comparison

format. Items were rated via SME judgment for the level of the trait in question that they

represented and their level of apparent social desirability. These items formed the initial

pool for the computer adaptive algorithm. Recent pilot testing results are currently being

analyzed and will be used to inform development of successive versions of ENCAPS.

References

Borman, W.C., Hanson, M.A., & Hedge, J.W. (1997). Personnel selection. In Spence,
J.T., Darley, J.M., & Foss, D.J. (Eds.), Annual review of psychology, Vol. 48 (pp.
299-337), Palo Alto, CA: Annual Review.

Borman, W.C., Hedge, J.W., Ferstl, K.L., Kaufman, J.D., Farmer, W.L., & Bearden,
R.M. (2003). Current directions and issues in personnel selection and classification.
In Martocchio, J.J., & Ferris, G.R. (Eds.), Research in personnel and human
resources management, Vol. 22 (287-355), Amsterdam: Elsevier.

Ferstl, K.L., Schneider, R.J., Hedge, J.W., Houston, J.S., Borman, W.C., & Farmer, W.L.
(In press). Following the roadmap: Evaluating potential predictors for Navy
selection and classification (NPRST-TN-03- ). Millington, TN: Navy Personnel
Research, Studies, & Technology.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
461

Pilot Selection in the Australian Defence Force:


AUSBAT Validation

Dr. Alan Twomey


Major Damian O'Keefe
Psychology Research and Technology group
Department of Defence, Australia

Abstract

Using a concurrent validation design, this study applied regression analytical techniques to develop
a statistical model for improving the prediction of outcomes at Basic Flying Training in the
Australian Defence Force (ADF). In particular it sought to evaluate the relative contribution made
by the Australian Basic Abilities Test (AUSBAT) battery of tests compared to the existing selection
battery, relevant biographical variables, some alternative trial tests, and a flight screening program
to predicting basic pilot training outcomes. Hierarchical regression analysis found that 46% of the
variance in overall sortie average rating scores could be predicted with Flight screening accounting
for most of this variance followed by one of the AUSBAT tests (Pursuit B). Cutoffs identified from
the regression equation enabled three groupings of trainees to be developed with failure rates in the
order of 33%, 19% and 0% respectively.

Introduction

This paper describes an attempt to improve the prediction of training outcomes at the Australian
Defence Force (ADF) Basic Flying Training school. In the process it reports the latest phase in the
development and evaluation of the Australian Basic Abilities Tests (AUSBAT) battery as a tool for
military pilot selection.

AUSBAT

The AUSBAT tests are computer generated tests delivered via a standard desktop PC utilising
specialised joy sticks. The theoretical underpinnings, early development and descriptions of the
tests have been reported by Bongers, S and Pei, J., (2001).

Since its conversion from a DOS environment to a Windows platform in 1999-2000, AUSBAT has
undergone systematic evaluation. Completed phases include the following:
a. Specification of standardised delivery parameters for each of the tests (Pei J., 2002)
now incorporated into technical documentation;
b. Development of standard scoring systems for the tests (O'Keefe D., 2002a);
c. Investigation of the construct validity of the tests (O'Keefe D., 2002a, Pei J., 2003);
and
d. initial concurrent validity studies against basic training outcomes (O'Keefe D.,
2002b).

Today, the AUSBAT battery comprises nine discrete tasks grouped in various combinations,
orientations, and difficulty levels, into fourteen tests, yielding twenty-eight measures. These
measures comprise accuracy scores (e.g. time spent in target area, number of correct answers) and
error scores (e.g. distance away from target area, orientation error of test objects in relation to
reference targets, and number wrong). Standard deviation scores are also calculated for 11
psychomotor accuracy and error scores.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
462

Construct validation studies (noted above) suggest that most tests assess the following six ability
areas - psychomotor, perceptual adjustment, working memory, time sharing, spatial ability and
visual memory. They further indicate that there is some overlap with existing pilot selection
measures in the domains of psychomotor ability, working memory/numeracy, spatial ability and
visual memory/perceptual speed, but not so much as to make the tests redundant.

Current studies are focussing on developing norms for the ADF officer aircrew applicant
population, evaluating them as selection aids for military pilots and a range of other ADF
occupations, and exploring their utility as tools for assessing the impacts of physiological variables
in studies of human cognitive performance.

Initial analyses of the relationships between AUSBAT scores and training outcomes for trainees
participating in, or recently completing, Basic flying Training (BFTS) showed significant
correlations ranging up to 0.34 for some of the tests and/or test factors. Combined with the index
generated by the current pilot test battery, one of the coordination tests predicted 19% of the
variance in BFTS outcome scores.

ADF Pilot Selection

Pilot selection and initial training in the ADF is currently undertaken on a tri-Service basis.
Excluding medical assessments, the selection process for civilian applicants involves six major
steps. These include:

Step 1: Achievement of test cutoffs in the Officer Test Battery (OTB);


Step 2: Achievement of test/pilot index cutoffs in the current Aircrew Test Battery (ATB);
Step 3: Positive recommendations following separate interviews by a psychologist and a
uniformed recruiting officer;
Step 4: Successful selection for flight screening following a file review and rating process;
Step 5: Successful completion of, and positive recommendation following, a two week
Flight Screening Program (FSP)37,38; and
Step 6: Successful selection following an Officer Selection Board (OSB) recommendation
and rating process.

A recent review of this process (Pei In press) suggests that from an initial applicant pool of about
2189 over a two year period (2001-2), only 196 (approx 9%) are recommended for BFTS following
FSP. This figure is only approximate as it is influenced by an unknown but small number of Army
applicants who did not attend FSP, limits on the places available at FSP, and the practice of
selecting only those rated at the top of the applicant pool awaiting FSP selection. Nevertheless it
highlights the quantum of the screen out rate in the selection process.

The ADF Basic Flying Training School (BFTS) has a capacity to train about 100 pilots per annum.
Each intake comprises a mix of Australian Defence Force Academy (ADFA) graduates and Direct
Entry Officers (DEO) officers from each of the three Services. ADFA graduates will have already
completed about 3 years of tertiary studies prior to commencing, while DE Officers will have
completed between 17 and 72 weeks of military training (depending on type of entry and Service) at
single Service officer training schools. While most will have been assessed for pilot aptitude as part
of their selection for the ADF, some will have been assessed as part of an in-service occupational
transfer process.

37
Applicants with 20 or more previous flying hours (PFH) undergo an advanced version, while those with less than 20 PFH undergo a basic version.
38
. Some Army applicants proceed directly to the Officer Selection Board without undertaking Flight Screening.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
463

Royal Australian Navy (RAN) and Royal Australian Airforce (RAAF) trainees who successfully
complete BFTS then proceed to advanced training at the ADF No 2 Flying Training School (2FTS).
Those who are successful progress to specialised training as transport, fast jet (RAAF) or rotary
wing pilots (RAN). Successful army trainees currently proceed directly to specialised training
following BFTS.

Given the substantial investment in each applicant and the costs of subsequent pilot training,
therefore, it is important that members reaching this stage have a good chance of successfully
completing their courses. For the period since it recommenced as a distinct entity in 1999 the pass
rate at BFTS has been about 67% (Pei J., In press).

Aim

This study has three aims:

a. to test with a more complete data set previous indications that at least one of the
AUSBAT tests adds to existing selection tools in predicting training outcomes at
the ADF Basic Flying Training School (BFTS);
b. to extend previous studies by including other trial tests in the evaluation and the
Flight Screening Program; and
c. to trial a new statistical model for ADF pilot selection incorporating the findings of
this study.

Method

Because of small numbers, all pilot trainee data has been grouped together. While this approach
may obscure differences between the various sub groups undertaking pilot training, it does
recognise extant decision making processes that are common for all forms of entry, and should
detect whether any AUSBAT or other tests add value to these processes.

Sample

AUSBAT results were obtained from pilot trainees (N=194) undertaking or recently completing
training at BFTS (65%) and 2FTS (23%). Some RAN (6%) and ARA (6%) trainees had moved to
specialist conversion training.

Variables

The independent variables included in the analysis fall into six categories.
a. Biographical: Age at testing and previous flying hours (PFH) have in previous
studies been shown to be related to success during pilot training.
b. OTB: The primary instrument in the Officer Test Battery, the Army General
Classification (AGC) test, is a measure of general cognitive measure. Because it is
administered in two forms, the standard score (General Ability Scale (GAS) it
produces was used in the analysis. The GAS is a 1-19 point normalised standard
scoring scale with each unit representing a quarter of a standard deviation.
c. ATB: The Aircrew Test Battery (ATB) comprises three tests - Instrument
Comprehension (IC) , Instruments - A (INSA) and the COORD, an
electromechanical device used for assessing psychomotor skills. Stanine scores are
calculated for each and combined to form a Pilot Index (PI) stanine which, together

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
464

with cutoffs on individual tests, comprise the primary current pilot selection
instrument.
d. Trial Tests: Data is being collected from applicants on a number of trial tests that
have previously been shown to be associated with flying experience. These are:
Aviation Reasoning (AVR), Visual Manoeuvres (VM) and Numerical
Approximations (NA). Raw scores of number correct were used for the analysis.
e. AUSBAT: Because of their extreme diversity and difficulty of direct interpretation
and comparison, AUSBAT test raw scores were converted to T scores with all error
scores being recoded such that a high score means less error.
f. FSP: Flight Screening Program (FSP). A two week program in which applicants are
assessed for their flying ability. Applicants are split into two groups based primarily
on their number of previous flying hours. The Raw Mean Score (RMS) represents
the average of sortie ratings received in both basic and advanced programs. It was
preferred over standardised scores currently obtained for each program as it retains
differential performance between both groups which is lost in the standardisation
process. Although varying slightly, both programs assess flying ability and the
ratings used were considered to be sufficiently similar to warrant inclusion as a
common scale.

The dependent variables record BFTS outcomes. They include an overall weighted mean score
ranging from 1-5 based on average sortie ratings, which in turn are calculated from sequence ratings
within each sortie. The number of sequences per sortie may vary, but the number of sorties per
course is about 70. This variable was used for regression and correlational analyses. A second
outcome variable was end of course status in which applicants are grouped according to whether
students - Self Suspend, are Back-Classed, Fail for Air Work or obtain a Pass, Credit or Distinction.
Only the latter 4 groupings were used for this analysis, as reasons for back-classing and self
suspension are often unrelated to flying ability. The numbers involved were also small (N=3 for
each).

Analyses

In the first phase of the analysis, relationships between the test variables and the criterion variable
and with each other were explored using correlational techniques.

In the second phase, variables showing close relationships with the criterion but not with each other
and those deemed important for process or theoretical reasons were analysed using a mix of
hierarchical and stepwise multiple regression techniques. Despite leading to marked reduction in
degrees of freedom, listwise deletion was preferred as it ensured all cases with relevant variable
data were included in the analysis. The regression equation arising from the above analysis was
used determine a new statistical model for use in guiding selection following FSP.

The effectiveness of the new statistical model was then assessed using cross tabulation analyses that
compared predicted training outcomes against actual BFTS pass and fail rates.

Results

Although age and PFH were foound not to be significantly correlated with BFTS sortie averages,
previous studies have indicated they are important predictors of training success and hence they
were included in the regression analysis. Not unexpectedly, the OTB test did not show up as
significant, and was not included in the regression analysis. Of the trial tests only the Aviation
Reasoning (AVR) test showed up as having a statistically significant correlation (r-.22) and hence
was included in the regression analysis. Similarly, although none of the individual measures

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
465

comprising the ATB pilot index (PI) correlated significantly, the PI did (r=.20) and hence for both
practical and statistical reasons was included. Similarly the FSP raw mean score showed an
especially high correlation (r=.50) and was included in the regression analysis.

Of the AUSBAT measures, two psychomotor (Pursuit B (r=.31) and Perceptual Adjustment Test -
Horizontal test error score (r=.20)); and three Working Memory tests showed up as significant
(Number Product - number right (r=.17) and number wrong (r=.19), Divided Attention test -
Number Product wrong (r=.19)). However, as the working memory measures intercorrelate highly
(>0.4) with each other, only one (Number Product - Number wrong) was included in the regression
analysis. It was preferred over its equivalent in the Divided Attention test, because it is more
efficient to administer in a selection scenario (the Divided Attention test requires administration of
both component tests independently prior to its administration) and because it contributed similarly
to the DA version in predicting the variance of the DV. On the other hand, although there is some
overlap, both conceptually and empirically, between the two psychomotor measures, it was not
considered sufficiently high (r<=0.3) to warrant exclusion of either test from the regression
analysis.

Accordingly, a mixed hierarchical, stepwise regression analysis was undertaken. The sequence and
method of inclusion of predictor variables took account of past research and the steps in which data
would normally be gathered in the selection process. Additionally, for AUSBAT and the
biographical measures, the relative importance of individual variables within each set in explaining
variance within the Dependent Variable and their capacity to meet inclusion criteria (F test sig<=
.05) were taken account of by introducing stepwise components in the SPSS analysis. Accordingly,
in step 1 PFH & Age were entered using Stepwise proceddures, and in steps 2 and 3, PI and AVR
were entered respectively. In Step 4 the AUSBAT measures (Pursuit B, Perceptual Adjustment -
Horizontal Distance score and Number product - No Incorrect) were entered using stepwise
procedures while in step 5, the FSP RMS was entered.

This model accounted for 45.9% of the variance in predicted sortie average scores. As evident in
Table 1 below, the two biggest predictors were the Pursuit B test and FSP RMS accounting for
10.4% and 21.2%, respectively. Age, previous flying hours and Number Product AUSBAT test
were deleted from the model as they failed to meet the criteria required for inclusion. Although the
't' value for the PI was not statistically significant, the PI was retained in the regression equation.
This was because of its importance to the current selection process, the consequent likely under
estimate of its individual relationship with the DV resulting from range restriction and the
significance value of the F statistic associated with its independent contribution to the prediction of
variance within the DV.
Table 11 - Regression Analysis Outcomes

Variable R sq F DF Sig F B t test


change change change sig

PI .051 6.021 112 .016 .028 .170


AVR .061 7.607 109 .007 .016 .011
PB .104 14.514 111 .000 .005 .046
NP-W .032 4.594 110 .034 .008 .003
FSP .212 42.239 108 .000 .631 .000
(Constant) -3.090 .000

To convert the equation into a more user-friendly tool, a review of a cross tabulation of BFTS
outcomes against the predicted Y values was undertaken. It indicated that the sample could be
grouped into three categories - High, Medium and Low, based on cutoffs set at 2.16 and 2.29 for
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
466

the latter two groups respectively. Translated to a selection scenario applicants falling into the
'High' category have a high risk (33%) of failing BFTS, while those in the 'Medium' and 'Low'
categories have progressively lower risks of failing (19%, 0% respectively). Table 2 summarises the
results of this analysis.
Table 2 - BFTS Outcomes by Cutoff group

Cut-off Group
N Percentage
Low Med High Low Mid High
FAW 7 3 0 33.3% 18.7% 0%
PASS 14 12 60 66.6% 75% 72.3%
CREDIT 0 1 17 0 6.3% 20.5%
DISTINCTION 0 0 6 0 0% 7.2%
Total 21 16 83 100% 100% 100%

Conclusion

The results may seem surprising in that neither age nor previous flying experience met statistical
criteria for inclusion in the model, despite previous research showing they are related to pilot
training outcomes. Both however, are thought to be important considerations in the subjective
assessment components of the selection process. Younger applicants and those with more flying
experience are often preferred. Range restriction, therefore, may be leading to an under-estimation
of the likely contribution of these variables on prediction BFTS outcomes. The same applies to the
Pilot Index. It is the main determiner for deciding who is likely to progress to more advanced levels
in the selection process.

The utility of the Aviation Reasoning Test is of interest. It is similar to previous predominantly
knowledge based tests that have been included previously in the pilot selection battery. There is
some suggestion it may be tapping into motivation as much as skill or ability. This study suggests
that its inclusion in the selection model may prove useful - at least for selection at the basic flying
training level.

Of the AUSBAT measures, the Pursuit B especially was predictive. Together with one of the
Perceptual Adjustment measures, it accounted for 13.6% of the variance in this study. This
confirms previous research supporting its utility in helping predict success at this level of training.
As some of the AUSBAT tests were designed to predict outcomes at more advanced flying training
levels, it is perhaps, not surprising that many are not showing up as statistically significant at the
basic level. Finally, the study provides strong evidence supporting the role of the flight screening
raw mean score in predicting BFTS outcomes additional to that provided by all the selection tests

The primary drawback of the study concerns the small sample size which precludes analyses of sub-
groups (eg Service specific, Basic or Advanced FSP Program) for which distinctive patterns of
variable interrelationships may be evident. The range restriction associated with some of the
variables also suggests the full contribution they make is likely to be underestimated.

Overall, however, the model accounts for a substantial proportion of the variance in BFTS sortie
averages. Given that some of the trainees in the sample did not commence BFTS training until three
years after they were tested and screened, the strength of the findings in this study is encouraging. It
suggests the approach is likely to have utility as a tool for predicting BFTS outcomes for ADF pilot
applicants.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
467

References

Bongers, S., and Pei, J. The Australian Basic Abilities Tests (AUSBAT). Presentation at IMTA,
Canberra, ACT, Australia, 23-25 Oct 2001;

O'Keefe, D. (2002a). Principal Component Analysis and Development of Scale Scores for the
Australian Basic Abilities Tests (AUSBAT), Psychology Research and Technology Group Research
Report 05/2003. Canberra, Defence Force Psychology Organisation.

O'Keefe, D. (2002b). Concurrent Validation of Australian Basic Abilities Tests (A USBAT) against
Basic Flight Training School (BFTS) Performance. Psychology Research and Technology Group
Research Report 06/2003. Canberra, Defence Force Psychology Organisation.

Pei, J. (2002). Determination of the Test Parameters for the AUSBAT Tests. Psychology Research
Group Technical Brief 15/2002. Canberra: Defence Force Psychology Organisation.

Pei J., (2003) The Construct Validity of AUSBAT. Presentation at the Australian Psychological
Society Industrial & Organisational Psychology Conference, Melbourne, Australia, 27-30 June
2003.

Pei J., (In Press). Overview of the selection rates at different stages of pilot selection and
graduation rates in Basic Flight Training. Psychology Research and Technology Group Technical
Brief 12/2003. Canberra, Defence Force Psychology Organisation.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
468

Development and Validation of a Revised ASVAB CEP Interest Inventory


Jane S. Styer, Ph.D.
Department of Defense Personnel Testing Division
DoD Center - Monterey Bay
styerjs@osd.pentagon.mil

Abstract

The Armed Services Vocational Aptitude Battery Career Exploration Program is a


comprehensive career exploration and planning program that includes aptitude and interest
inventory assessments. Currently, the program includes a 240-item, Holland-based, self-
scored, paper-and-pencil interest inventory. This inventory has not been updated since it was
implemented in 1995.
The outcome of the current study will be a 90-item interest inventory that will be
administered via the Internet and by paper-and-pencil. The study will be conducted in two
phases. The item and form development phase is currently nearing completion. The
validation phase will commence in January 2004.
In the item and form development phase of the study, out-of-date items were
eliminated and new items were written to reflect the considerable changes in technology and
the world-of-work. From a pool of over 1,000 new and old items, 600 items were selected
and assembled into two 300-item forms. These forms will be piloted with approximately 600
local high school students. Item performance statistics (e.g., endorsement rates and item to
scale correlations) will be reviewed to identify good performing items. These items will be
used to assemble two experimental forms to be administered in the second phase of the study.
A development and validation study will be conducted in the spring semester in 2004
with a nationally representative sample of approximately 6,000 eleventh and twelfth grade
students in 50 to 55 high schools. Each student will complete one of the two experimental
forms and the Strong Interest Inventory in a counterbalanced design. A subset of schools will
be used to collect test-retest data.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
469

The Armed Services Vocational Aptitude Battery (ASVAB) Career Exploration


Program (CEP) annually serves approximately 800,000 students in over 14,000 schools
nationwide. The Program provides 10th, 11th, 12th, and post-secondary students with a
comprehensive vocational assessment program at no cost.
The program includes two assessment components: the ASVAB (Defense Manpower
Data Center, 1995) and the Interest-Finder (IF: Wall, Wise, & Baker, 1996). These
components provide students with information about their academic abilities and career
interests. The current IF is a paper-and-pencil, 240-item interest inventory based on John
Holland’s (1985, 1997) widely accepted theory of career choice. Students indicate their
preference (i.e., Like or Dislike) to various Activity, Education and Training, and Occupation
items. The IF is a self-administered and self-scored inventory that yields both raw scores and
gender-based percentiles for the six scales. Students use their two or three highest Interest
Codes to explore potentially satisfying occupations.
There are a number of reasons for developing a new interest inventory. The items in
the IF were developed in the early 1990s. Apart from the datedness of some items, other
items have shown a decrement in their item performance statistics over time. Also, the
current items do not reflect the significant changes in technology that have occurred in the
past decade.
Finally, there is a need to have a shortened interest inventory. Typically, it takes
approximately 20 to 30 minutes to take and self-score the IF. A 90-item interest inventory
could be completed and scored in 12-15 minutes, thus freeing up more time in a typical 50-
minute interpretation session to focus on aptitude test score interpretation, interest inventory
results, and a discussion of career exploration and planning activities.
The outcome of this study will be a 90-item interest inventory. The study will be
conducted in two phases. The two phases will focus on the item and form development and
validation of the inventory, respectively. Phase two will not commence until January 2004.
The remainder of this paper will address only the first phase of the study.

ITEM AND FORM DEVELOPMENT

Developing a new inventory, as opposed to updating and shortening an existing


inventory, allowed us to make more substantive changes. We found that some of the IF
scales are narrowly defined and narrowly sampled. Subsequent to a review of the literature,
we revised the scale definitions. For example, in the IF, Realistic is defined as: “People in
REALISTIC occupations are interested in work that is practical and useful, that involves
using machines or working outdoors, and that may require using physical labor or one’s
hands.” The revised definition reads: “Realistic people prefer work activities that include
practical, hands-on problems and solutions, such as designing, building, and repairing
machinery. They tend to enjoy working outside, with tools and machinery, or with plants and
animals. Realistic types generally prefer to work with things rather than people.”
In the IF, items are organized by scale and scale definitions are provided. Students
read the scale definition and use two response options (i.e., Like and Dislike) to indicate their
preference for the Activity, Training and Education, and Occupation items. We have changed
the response scale to Like, Indifferent, and Dislike. This response scale allows respondents to
indicate their indifference to items and is more consistent with the scale used in other interest

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
470

inventories (Harmon, Hansen, Borgen & Hammer, 1994; ACT, 1995). We eliminated the
Training and Education and Occupation items; the new inventory will consist of only
Activity items. We also eliminated the scale definitions and spiraled the items. Attachment A
shows the directions for the new inventory, response scale, and illustrates the spiraling of
items.

Item Development
We began item development by reviewing the item performance statistics of the
current IF items based on two studies conducted in early to mid 1990s (Wall, Wise, & Baker,
1996) during the development of the IF. We also reviewed item performance statistics from
data collected in the National Longitudinal Survey of Youth 1997 (Moore, Pedlow,
Kishnamurty, & Walter, 2000). Attachment B shows the item selection screens employed to
identify and categorize items into three groups: Best, Good, and Marginal. The quality of
items within the six RIASEC domains varied. As a result, we identified domains where more
new items would be needed relative to other domains. Items selected for inclusion in the two
experimental forms were selected primarily from the Best category; however, when
necessary items from the Good category were also selected.
Two groups of people were asked to write new items. Item writing workshops were
conducted with five item writers from the Personnel Testing Division (PTD) and volunteers
from the staff at the Defense Manpower Data Center. Fifteen individuals submitted 1,019
items. The PTD item writers reviewed and edited the items and eliminated poor items and
duplicates. This resulted in 487 new items.
Four professionals with extensive experience and expertise with RIASEC typology
independently reviewed and assigned the 487 items to RIASEC domains. Items that they
were unable to assign to one RIASEC domain were edited so that the assignment could be
made. Once the items were categorized, three of the experts sorted the items into RIASEC
types, reviewed the item content, and assessed the coverage within the six domains. For
example, the experts identified the Investigative domain as having too few items pertaining
to the social sciences. The experts wrote items for the areas not covered or not sufficiently
covered by the pool of new items. The experts wrote a total of 89 new items.

Form Development
Two 300-item experimental forms are currently being developed. In assembling these
forms we will select the best performing items from each of the six scales from the current IF
and divide these between the two forms. We will include the items that were consistently
assigned to the same RIASEC domain by the experts and the items written by the experts to
broaden the sampling of the content within the domains. Final selections will be made from
items that were edited by the experts to facilitate assignment to a single domain.
Once developed, the two experimental forms will be reviewed by a group of junior
and senior level English teachers to evaluate the understandability and reading level of the
items. The two forms will be spiraled and administered to approximately 600 juniors and
seniors at a local high school. Item endorsement rates, item-to-scale corrected correlations,
and item-to-all-scales corrected correlations will be reviewed to identify poorly performing
items. These items will be replaced by other new or old items.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
471

Attachment A

Career Exploration Program


Interest Inventory
An interest inventory can help you identify your work-related interests.
Knowing your work-related interests can help you determine career fields or
occupations that would be potentially satisfying.

Directions: Interest inventories are not like other tests. There are no right or wrong
answers. For each of the activities listed in the inventory, ask yourself if you would
like or dislike doing that activity. In answering, don’t be concerned with how well
you would do any activity or whether you have the experience or training to do it.
Just think about how much you would like or dislike doing the activity and select an
answer from the following responses:

Darken L for Like. (I would like to do this activity.)

Darken I for Indifferent. (I don’t care one way or the other.)

Darken D for Dislike. (I would not like to do this activity.)

In deciding, it is best to go with the first answer that comes to mind and not to think
too much. If you don’t have a strong preference one way or the other you can answer
Indifferent (I don’t care one way or the other). Try to answer as many questions as
possible with Like or Dislike. Answer all of the questions, even when you are unsure.

Mark only one response for each activity. It will take about 40 minutes to complete the
inventory, but there is no time limit. Don’t rush. Take your time and enjoy yourself.

1. Connect a DVD player


2. Study the solar system
3. Act in a play
4. Work as a camp counselor
5. Manage a restaurant
6. Balance a checkbook
7. Feed and bathe zoo animals
8. Analyze evidence of a crime
9. Develop landscape designs
10. Teach a class
11. Campaign for a political cause
12. Take minutes for a meeting

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
472

Attachment B

ITEM SELECTION SCREENS

First Screen

1. Endorsement rates (er) equal to or greater than .1 for all subgroups


2. Item-to-scale correlation of .4 and above
3. Item-to-scale correlation larger than the same item adjacent scales

Second Screen

Categorize and sort items into three categories.

Best Items
1. Absolute difference of endorsement rates for all subgroups no greater than .15
2. Item-to-scale correlation of .5 and above
3. Strong hexagonal correlation pattern

Good Items
1. Absolute difference of endorsement rates for all subgroups no greater than .18
2. Item-to-scale correlation of .4 to .49
3. Correlation pattern approximates the hexagon

Marginal Items
The item meets at least two of the three Good item criteria.

Third Screen

1. Evaluate the coverage of each scale domain by the Best and Good performing items.
Identify gaps and redundancy.
2. Evaluate the balance of each scale from the perspective of each subgroup to ensure
balance.
3. Consider the possibility of using Marginal items.
4. Consider each Training and Education and Occupations items to see whether they can be
rewritten as an Activity item.

Fourth Screen

1. Compare individual judgments and come to consensus.


2. Identify gaps and weaknesses in domain coverage.
3. Conduct analyses of new interim scales.
4. Estimate numbers of new items needed by scale.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
473

References

Harmon, L. W., Hansen, J. C., Borgen, F. H., & Hammer, A. L. (1994). Strong Interest
Inventory applications and technical guide. Palo Alto, CA: Consulting Psychologists
Press.

Holland, J. L. (1985). Making vocational choices: A theory of vocational personalities and


work environments (2nd ed.).Englewood Cliffs, NJ: Prentice-Hall.

Holland, J. L. (1997). Making vocational choices: A theory of vocational personalities and


work environments (3rd ed.). Odessa, FL: Psychological Assessment Resources.

Moore, W., Pedlow, S., Kishnamurty, P., & Walter, K. (2000). National longitudinal survey
of youth 1997 (NLSY97). Chicago, IL: National Opinion Research Center.

Swaney, K. (1995). Technical manual: Revised unisex edition of the ACT Interest Inventory
(UNIACT). Iowa City, IA: ACT, Inc.

Wall, J. E., Wise, L. L., & Baker, H. E. (1996). Development of the Interest-Finder – a new
RIASEC-based interest inventory. Measurement and Evaluation in Counseling and
Development, 29, 134-152.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
474

JOB AND OCCUPATIONAL INTEREST IN THE NAVY


Stephen E. Watson, Ph.D.
Director, Navy Selection and Classification
Bureau of Naval Personnel
Washington, DC 20370-5000
Stephen.E.Watson@Navy.Mil

INTRODUCTION

The United States Navy is currently taking an innovative approach to improving the
match between enlisted personnel (Sailors) and their jobs (Ratings). The Rating Identification
Engine (RIDE) is the Navy’s decision support system designed to help enlisted classifiers
provide initial guidance counseling to Applicant-Sailors (i.e., potential enlistees), and re-assign
Sailors to new Ratings during their career. During accessions, RIDE provides a rank ordered list
of recommended Navy Ratings to the Classifier and Applicant-Sailor and presents a wide variety
of educational and career information about these jobs (i.e., enlistment periods, bonuses,
promotion rates, civilian education and job equivalents). Current inputs to RIDE include an
ability component, which utilizes scores on the Armed Services Vocational Aptitude Battery, and
a Navy Need input providing available training opportunities and emphasizing critical Navy
Ratings.As the Navy strives to improve the Sailor-Rating match, a collection of independent
variable measurements are being investigated as additional inputs to RIDE, including measures
of personality (Farmer, Bearden, Borman & Hedge, 2002), general time sharing ability (Watson,
Heggestad, & Alderton, 1999), and vocational interest. The most mature of these additional
measures is the job interest measure known as Job and Occupational Interest in the Navy (JOIN).
In the development of JOIN, reviews suggested traditional approaches and existing
inventories were insufficient for the Navy’s intent (Lightfoot, Alley, Schultz, Heggestad and
Watson, 2000). In general, these approaches failed to provide sufficient differentiation between
Navy jobs. Navy Ratings (and military specialties in general) tend to fall into technical and
scientific interest domains, rather than a more evenly distributed representation across the
Holland domains (1999). Likewise, the interests of military personnel should follow this limited
distribution pattern as compared to college or college-bound populations (see Lightfoot et al.,
2000, for a complete discussion of these issues).
For specificity, rather than attempting to tap into established interest factors or
personality traits which may indicate potential job satisfaction in a broad collection of jobs
(Holland 1999) the author decided to focus on a componential classification (or taxonomy)
indexing approach. In the approach presented here, preferred jobs are retrieved through
collections, or profiles, of indices, rather than matching a person’s “type” to a broad job area, and
then retrieving jobs in that area. Campbell, Boese, Neeb and Pass (1983) have made compelling
arguments for similar taxonomic approaches.
The creation of an interest taxonomy requires the creation and categorization of some
elements of job interest. Primoff (1975) has defined a “job element” as a worker characteristic
that influences success in a job, and includes interest as a job element. Unfortunately, there does
not exist a consensus definition for the phrase “vocational interest” (Hindelang, Michael &

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
475

Watson, 2002; Dawis, 1991), or perhaps even the construct being measured by vocational
interest inventories (Holland, 1999). For the purposes of this effort, we assume that we can
measure job interest elements representing: a) Work the individual would prefer (specifically and
generally), and b) Preferred work environments.
We also assume that these measured vocational interest elements are a valid source for
defining some portion of the vocational interest preferences of the individual. Given this
approach, it is also necessarily assumed that a complementary set of classifications of Navy
Ratings, relying on vocational interest elements, is possible.
An additional and very important challenge in building an operational tool is to create
something that is not only valid within the artificial constraints of laboratory and research
settings, but also operational and maintainable, without a compromise in established validity.
Addressing the technological, operational, and theoretical concerns described above, the
author decided that utilizing a universally available spreadsheet application to capture, apply and
maintain a taxonomy based grid representing a Navy interest model, was the most parsimonious
approach. Two-dimensional data arrays and spreadsheets are common elements in contemporary
operational office environments. These spreadsheet formats are easy to process, understand, and
are well supported across a variety of computing platforms.
Admittedly, the idea of allowing a technology and its associated constraints to so clearly
influence the development of a model of human cognition is not without reproach. Although this
is a substantive issue meriting consideration, the current discussion will focus on the
development of the model intended to drive JOIN, and allow the establishment of validity to
answer the important question of approach effectiveness. The JOIN application generalizes the
model for use in working environments, and will eventually provide the greatest test of robust
validity. Without constraining the application of this model to be easily applied to a variety of
situations, eventual tests of validity would instead be quite limited and prolonged.
In summary, the production of the current componential model of Navy Ratings, is a job
classification task, focused on job interest elements, using a taxonomic grid (or spreadsheet)
approach. Recommended Navy Ratings will be retrievable through patterns of indices, based on
measured vocational interest responses. Finally, it should be noted that the development of this
componential model, or “classification indexing of interest”, is heavily influenced explicitly and
implicitly by Fleishman and Quaintance (1984), and their summary and interpretation of work
cited therein.

METHOD

Previous attempts at developing a model and items sufficient for use in JOIN (Lightfoot
et al., 2000; Alley, Crowson & Fedak, in press) were not appropriate for the current approach for
a variety of reasons. For example, due to variance in the level of description of job elements (see
Lightfoot et al., 2000, pp.17; but also Alley, Crowson & Fedak, in press), detail and description
in a taxonomy became problematic. Additionally, since the context for interpreting statements
had been procedurally stripped away, interpretation and conversion to more generic or more
specific statements, or into job interest elements, was not possible. Finally, while some of the
task lists, statements and items developed were linked to specific Ratings, most were not, and as
such could not be used to build descriptions of the Ratings from which they were derived.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
476

Phase I: Initial Job Interest Elements Production

To create a completely new set of data, all available job descriptions for all Navy enlisted
Ratings were downloaded from the Enlisted Community Manager’s website. The author printed,
compiled, and reviewed each of these job descriptions, and highlighted words reflecting process
(action verbs, e.g., make, maintain), content (direct object nouns, e.g., electronics, documents),
environment (e.g., indoor vs. outdoor), and community (e.g., surface, submarine, aviation) to
construct interest job elements (see also, Hindelang, Michael & Watson, 2002, for
complementary description of Methods). From the collections of highlighted extracts, the author
developed common statements for each of the Ratings containing: 1) Between one and five
Process-Content (PC) pairings 2) For each PC pairing, between one and five parenthesized
examples. 3) At least one community statement. 4) At least one environment statement. 5) At
least one ‘style’ statement. An example of these statements is shown in Figure 1.

Rating Community Process Content Work Style


ABE Surface Maintain mechanical (hydraulic systems, steam systems, pneumatic) outdoor
ABE Operate mechanical (hydraulic systems, steam systems, pneumatic) physical
ABE Operate heavy equipment (crane, tractor)
ABE Respond emergency (fire, crash, flood)
ABE Aviation Direct aircraft
Figure 1. Example of job interest elements for the Aviation Boatswain Mate-Launch/Recovery
(ABE) rating built from analysis of ECM documents, and revised in iterative SME interviews.

Phase II: Job Interest Element Revision

The collection of job interest elements (see Figure 2) developed in Phase I were used in
semi-structured interviews with Subject Matter Experts (SME’s). These SME’s were Enlisted
Community Managers (ECM’s; typically the officer supervisors of enlisted personnel, with over
10 years of experience), and the Enlisted Community Technical Advisors (TECHADS; typically,
enlisted personnel with 10-30 years of experience in the interviewed rating). In these small group
interviews (two to four participants) the author systematically presented elements from the
relevant pre-conceived descriptions, and interviewees were asked to assess and comment on the
appropriateness of elements, and modify elements as necessary. Interviewees were also allowed
to add words from the “precoordinated” (Fleishman & Quaintance, 1984) list of terms, which
had been created by pooling all terms developed in Phase I. Also, natural language terms
considered by the interviewer to be key to the description, occurred frequently, and were not
represented in the precoordinated vocabulary, were added to the pool.
The development of the final interest element collection for each rating continued from
this point as a process of iterative interviews, refining and modifying the collection of job
interest elements. Upon completion, all Ratings were described by some collection of interest
elements, and all interest elements occurred for multiple (but not all) Ratings, and could be used
to successfully differentiate between all but 7 Navy Ratings. Ultimately, 27 PC interest elements,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
477

7 community interest elements, and 8 work style interest elements were identified as being
sufficient to represent the 79 entry-level Navy enlisted Ratings. These interest elements and
Ratings were transformed to a taxonomic grid, in spreadsheet format (see Figure 3), commonly
referred to as the “Rating DNA.”

COMMUNITY PROCESS-CONTENT WORK STYLE

maintain-elec equip
direct-emerg resp
special programs
JOB GROUP

analyze-comms

maintain-docs
analyze-docs
direct-aircraft
analyze-data
RATING

work indep
healthcare

submarine

industrial

physical
aviation

outdoor
surface

mental
indoor

office
Aviation Mechanical ABE 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 0
Aviation Mechanical AD 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0
Health Care DT 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1
Submarine Personnel ST 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 1 0 0 1
Applicant 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 0
Figure 2. An abbreviated and simplified example of the “Rating DNA” and a single applicant’s
collection of responses.

Phase III: Linking Pictures to Job Interest Elements

Digital images from a variety of sources were collected to represent job interest elements.
Using a series of iterative workshops with the same population of SME’s as used in Phase I & II
described above, 9 images were identified which represented each PC interest element, 3 images
for each community interest element, and 3 images for each work style element. Each multiple
image collection was selected for a variety of aesthetic qualities (e.g. image clarity), but also to
include a distribution of demographic makeup and task representations.

Phase IV: Instantiation in JOIN

PC pairs were presented 3 times, so that psychometric properties of items could be tested. Figure
3 illustrates the framework and design of the JOIN software. Responses from each of the job
interest element areas, Process-Content, Community and Work Style, were collected, and used to
develop an interest profile for the individual. The pattern of responses can be interpreted through
relationships between rating structure and responses using Figure 3. Each applicant’s response
pattern is represented as a single row matrix (see Figure 2), with job interest elements reflected
as a continuous variable between 0 and 1 (user actually responds 0-100% using JOIN interface).
This methodology and organization for the collection of data lends itself to a variety of potential
indexing and retrieval schemes and procedures, allowing us to explore a variety of approaches to
match Ratings to patterns of responses.

Phase V: System Test

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
478

The JOIN system was tested by 300 new recruits, attending Recruit Training Center,
Great Lakes, IL (Michael, Hindelang, & Watson, 2002). Recruits also participated in small
group discussions, and filled out questionnaires regarding the software.

RESULTS AND DISCUSSION

Two sets of analyses are critical in the validation of the current approach. First
Hindelang, Michael and Watson (2002) presented results from a principal component analytic
approach which indicated a convergence of the current Process-Content defined structure with
previous factor analytic representations of Navy jobs. Labels for 7 of the 9 components
(accounting for 92% of the variance) were readily available and intuitive (e.g., Technical
Mechanical Activities, Administrative Activities, Intelligence), while the remaining 2
components (accounting for 8% of the variance) were not.
Second, Michael, Hindelang, and Watson (2002) report findings from the RTC, Great
Lakes test of the JOIN system. In general, recruits found the JOIN system easy to use, intuitive
and appealing. All PC pairs, communities and work styles were responded to with some level of
interest, and there were substantial and reliable individual differences in response patterns. The
(alpha) reliability estimates for each PC job interest element were very good, ranging from .83
(Operate Mechanical Equipment) to .95 (Make Facilities), and .91 overall.
Taken as a whole, results from the described efforts suggest that developing a job interest
measure from job interest elements derived by SME evaluation can be an effective approach for
organizations offering training and careers in some subset of all possible jobs. Future research
will test the predictive validity of the system with respect to training and career success.
The approach described here is efficient with respect to establishing rating descriptions,
and is facile in the creation of new jobs, as well as the modification and merging of existing job
structures. Additionally, as technical requirements for jobs change, job interest elements can be
readily added and subtracted from a job profile, and entirely new job interest elements can be
added to the system based on SME input. The intuitive and simple approach described here
should result in a functional, accurate and easily maintained vocational interest measurement and
matching system, JOIN.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
479

POTENTIAL MAPPINGS

3-1 Many-Many Many-Few


Job Group
E 1.1 Commu 1
E 1.2 PC 1 Rating1 Commu 2…
E 1.3
Commu n
E 2.1
E 2.2 PC 2... Rating2…
E 2.3
E 3.1 Work 1

E 3.2 P-Cn Ratingn Work 2…

E 3.3 Work n
Job Group
Job Interest Elements Ratings Job Interest
Multiple response items Rating Elements
Figure 3. Illustration of possible relationships between JOIN items, job interest elements, and
existing Ratings and Ratings groups.

REFERENCES

Alley, W.E., Crowson, J.J., & Fedak, G.E. (in press). JOIN item content and syntax templates
(NPRST-TN-03). Millington, TN: Navy Personnel Research, Studies, & Technology.
Cunningham, J.W., Boese, R.R., Neeb, R.W., & Pass, J.J. (1983). Systematically derived work
dimensions: factor analyses of the Occupation Analysis Inventory. Journal of Applied
Psychology, 68, 232-252.
Dawis, R. (1991). Vocational interests, values, and preferences. In M.D. Dunnette, & L.M.
Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol. 2,
pp. 271-326). Palo Alto, CA: Consulting Psychologists Press.
Farmer, W.L., Bearden R.M., Borman, W.C., & Hedge, J.W. (2002). Navy Psychometrics of
Measures: NAPOMS. Proceedings of the 44th Annual Conference of the International
Military Testing Association.
Fleishman, E.A., & Quantance, M.K., (1984). Taxonomies of Human Performance. Orlando,
FL: Academic Press, Inc.
Hindelang, R.L., Michael, P.G., & Watson, S.E. (2002). Considering Applicant Interests in
initial Classification: The Development of Job and Occupational Interest in the Navy
(JOIN). Proceedings of the 70th Military Operations Research Society Symposium.
Holland, J.L. (1999). Why vocational interest inventories are also personality inventories. In
Savickas, M.L., & Spokane, A.R. (Eds). Vocational interests: Meaning, measurement,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
480

and counseling use. Palo Alto, CA: Davies-Black Publishing/Consulting Psychologists


Press, Inc.
Lightfoot, M.A., McBride, J.R., Heggestad, E.D., Alley, W.E., Harmon, L.W., & Rounds, J.
(1999). Navy interest inventory: Approach development (FR-WATSD-99-13). San Diego,
CA: Space and Naval Warfare Systems Center.
Lightfoot, M.A., Alley, W.E., Schultz, S.R., & Watson, S.E. (2000). The development of a Navy-
specific vocational interest model. Alexandria, VA: Human Resources Research
Organization.
Michael, P.G., Hindelang, R.L., & Watson, S.E. (2002). JOIN: Job and Occupational Interest in
the Navy. Proceedings of the 44rd Annual Conference of the International Military
Testing Association.
Primoff, E.S. (1975). How to prepare and conduct job element examinations (Tech. Study 75-1).
Washington, DC: Government Printing Office, 1975.
Savickas, M.L. (1999). The psychology of interests. Holland, J.L. (1999). Why vocational
interest inventories are also personality inventories. In Savickas, M.L., & Spokane, A.R.
(Eds). Vocational interests: Meaning, measurement, and counseling use. Palo Alto, CA:
Davies-Black Publishing/Consulting Psychologists Press, Inc.
Watson, S.E. (2001). Considering Applicant Interests in Initial Classification: The Rating
Identification Engine Using Job and Occupational Interests in the Navy (RIDE/JOIN).
Proceedings of the 69th Symposium of the Military Operations Research Society.
Watson, S.E., Heggestad, E.D., & Alderton, D.L. (1999). General Time-Sharing Ability:
Nonexistent or Just Rare? Proceedings of the 41st Annual Conference of the International
Military Testing Association.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
481

Vocational Interest Measurement in the Navy - JOIN


William L. Farmer, Ph.D. and David L. Alderton, Ph.D.
Navy Personnel Research, Studies, and Technology Department (PERS-1)
Navy Personnel Command
5720 Integrity Drive, Millington, TN, USA 38055-1000
William.L.Farmer@navy.mil

Job and Occupational Interest in the Navy (JOIN), has been developed for use in
conjunction with the Rating Identification Engine (RIDE) to help provide a better match
between a recruit’s abilities, interests and specific occupations (i.e., ratings). JOIN measures
interest in specific work activities and environments, as well as providing recruits with Navy job
information.
JOIN evolved from a more or less typical interest inventory, comprised of a number of
very specific work activity statements, into a picture-based instrument that solicits interest
ratings in a series of generalizable work component items. There were four objectives in the
design of JOIN. First, it had to differentiate among the 79 entry-level Navy jobs (these jobs are
known as “ratings”), something civilian and other military interest measures could not do. It
had to be “model based” so that it could be quickly adapted to Navy job mergers, changes, or
additions. Third, JOIN needed to be useable by naïve enlisted applicants; naïve in terms of
knowledge of Navy jobs and the technical terms used to describe them. Finally, JOIN needed to
be short and engaging to encourage acceptance by Navy applicants and those who process them.
The development of the JOIN model and tool is described in a series of NPRST reports (Alley,
Crowson, & Fedak, in press; Lightfoot, Alley, Schultz, Heggestad, Watson, Crowson, & Fedak,
in press; Lightfoot, McBride, Heggestad, Alley, Harman, & Rounds, in press; Michael,
Hindelang, Watson, Farmer, & Alderton, in press).

JOIN SOFTWARE TESTING

Usability Testing I

The first test phase occurred during August of 2002 at the Recruit Training Center (RTC)
Great Lakes, and was conducted with a sample of 300 new recruits. Participants were presented
with JOIN and its displays, images, question content, and other general presentation features in
order to determine general test performance, item reliability, clarity of instructions and intent,
and appropriateness with a new recruit population for overt interpretability, required test time,
and software performance. The initial results from the usability testing were very promising on
several levels. First, the feedback from participants provided researchers with an overall
positive evaluation of the quality of the computer administered interest inventory. Second, the
descriptive statistical analyses of the JOIN items indicated that there was adequate variance
across individual responses. In other words, the participants were different in their level of
interest in various items. Finally, the statistical reliability of the work activity items was
assessed and the developed items were very consistent in measuring participant interest in the
individual enlisted rating job tasks. The results from this initial data collection effort were used
to improve the instrument prior to subsequent usability and validity testing (Michael,
Hindelang, Watson, Farmer, & Alderton, in press).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
482

Instrument Refinement

Based on the results of the initial usability study, a number of changes were made. These
changes were made with three criteria in mind. First, we wanted to improve the interface from
the perspective of the test taker. Second, it was imperative that testing time be shortened.
Though this modification does contribute to the “user-friendliness” of the tool, the initial
impetus for this was the very real operational constraint, as directed by representatives from the
Navy Recruiting Command (CNRC), that the instrument take no more than ten to fifteen
minutes to complete. Finally, if at all possible, it was necessary that the technical/psychometric
properties of the instrument be maintained, if not enhanced.
Though the initial usability testing was favorable overall, one concern was voiced on a
fairly consistent basis. Respondents stated that there was an apparent redundancy in the items
that were presented. This redundancy was most often characterized, as “It seems like I keep
seeing the same items one right after another.”
One explicit feature that was a target during the initial development was that a set of
generalizable process and content statements would be used. For instance, the process
“Maintain” is utilized in nine different PC pair combinations and a number of the content areas
are used in as many a three PC pairs. Due to targeted design it was decided that this feature
would not be revised.
Also contributing to the apparent redundancy was the fact that three versions of each PC
item were presented, yielding a total of 72 PC items being administered to each respondent. This
feature had been established as a way of ensuring psychometric reliability. With a keen eye
toward maintaining technical standards, the number of items was cut by one-third, yielding a
total of 56 PC items in the next iteration of the JOIN tool.
Finally, the original algorithm had specified that all items be presented randomly. Though
the likelihood of getting the alternate versions of a PC pair item one after the other was low, we
decided to place a “blocking constraint” in the algorithm; whereby an individual receives blocks
of one version of all of the 26 PC pairs presented randomly. With the number of PC pair
presentations being constrained to two, each participant receives two blocks of 26 items.
As users had been pleased with the other major features of the interface, no refinements
were made other than those mentioned. Reduction in testing time was assumed based on the
deletion of PC item pairs. Decisions to delete items were made using a combination of rational
and technical/psychometric criteria. As stated earlier, initial item statistics had been favorable in
that internal consistencies within 3-item PC scales were good (mean α = 0.90), and sufficient
variation across scale endorsement indicated that individuals were actually making differential
preference judgments. Items were deleted if they contributed little (in comparison to other items
in the scale) to PC scale internal consistency or possessed response distributions that were
markedly different from alternate versions of the same item. In lieu of definitive empirical
information, items were also deleted if they appeared to present redundant visual information (as
judged by trained raters). The resulting 2-item PC scales demonstrated good internal consistency
(mean α = 0.88). Additional modifications were made that enhanced item data interpretation and
allowed for the future measurement of item response time.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
483

Usability Testing II

The second phase of testing occurred over a three and a half month period in the spring of
2003 at RTC Great Lakes. A group of approximately 4,500 participants completed the refined
JOIN (1.0e) instrument. The group was 82% male, 65% white, 20% black, and 15% other.
From a usability perspective, 93.2% of all respondents rated JOIN “good” or “very good.”
Regarding the PC descriptors, 90.4% of respondents felt that the pictures did a “good” or “very
good” job of conveying the information presented in the descriptors, and 80.5% stated that the
items did a “good” or “very good” job of conveying Navy relevant job information to new
recruits. In terms of psychometric quality, the average PC scale α was 0.87. Descriptive
statistics indicate that participants have provided differential responses across and within work
activity scales. The average testing time decreased (from the original version) from an average
of 24 minutes to 13 minutes. The average time spent per item ranges from 8 to 10 seconds
(except for special operations items – 21 seconds). Special programs and Aviation are preferred
communities, with working outside and in a team as the work environment and style of choice.
As in the initial pilot test, the most desirable work activity has been to operate weapons.

Criterion Related Validity Testing

Currently the data collected in the most recent round of testing is also being used to
establish criterion-related validity of the JOIN instrument. As those who completed the
instrument lack prior experience or knowledge of the Navy or Navy ratings, they are an ideal
group to use for establishing predictive validity of the tool. Criterion measures (e.g. A-school
success) will be collected as participants progress through technical training, and those data
become available. Participants’ social-security-numbers (SSN) were collected to link interest
measures to longitudinal data, including the multiple survey 1st Watch source data. Additional
measures will also include attrition prior to End of Active Obligated Service (EAOS), measures
of satisfaction (on the job and in the Navy), propensity to leave the Navy, or desire to re-enlist.
Additionally, JOIN results will be linked with performance criteria.

JOIN MODEL ENHANCEMENT

In addition to the establishment of criterion-related validity, current efforts are focused on


establishing and enhancing the construct validity of the SME model upon which the JOIN
framework exists. As mentioned previously, the tool was developed using the judgments of
enlisted community managers. In addition to decomposing rating descriptions to core elements
and matching photographs with these elements, this group also established the initial scoring
weights, which were limited in the first iteration to unit weights. At present, NPRST researchers
are conducting focus group efforts with Navy detailers, classifiers, and A-school instructors for
the purpose of deriving SME rater determined numerical weights that establish an empirical link
between JOIN components and all existing Navy ratings that are available to first term sailors.
These weights will be utilized in the enhancement of the scoring algorithm that provides an
individual preference score for each Navy rating. A rank ordering (based on preference scores)
of all Navy ratings is provided for each potential recruit.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
484

FUTURE DIRECTIONS

Plans include linking JOIN results with other measures that include the Enlisted Navy
Computer Adaptive Personality Scales (ENCAPS) and other individual difference measures
currently being developed at NPRST. The establishment of a measurable relationship between
job preference and such constructs as individual temperament, social intelligence, teamwork
ability, and complex cognitive functioning will greatly advance the Navy’s efforts to select and
classify sailors and ensure the quality of the Fleet into the future.

References

Alley, W.E., Crowson, J.J., & Fedak, G.E. (in press). JOIN item content and syntax templates
(NPRST-TN-03). Millington, TN: Navy Personnel Research, Studies, & Technology.

Lightfoot, M.A., Alley, W.E., Schultz, S.R., Heggestad, E.D., Watson, S.E., Crowson, J.J., &
Fedak, G.E. (in press). The development of a Navy-job specific vocational interest
model (NPRST-TN-03). Millington, TN: Navy Personnel Research, Studies, &
Technology.

Lightfoot, M.A., McBride, J.R., Heggestad, E.D., Alley, W.E., Harmon, L.W., & Rounds, J.
(in press). Navy interest inventory: Approach development (NPRST-TN-03).
Millington, TN: Navy Personnel Research, Studies, & Technology.

Michael, P.G., Hindelang, R.L., Watson, S.E., Farmer, W.L., & Alderton, D.L. (in press). JOIN:
I. Interest inventory development and pilot testing. (NPRST-TN-03). Millington, TN:
Navy Personnel Research, Studies, & Technology.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
485

THE ARMY VOCATIONAL INTEREST CAREER EXAMINATION


Mary Ann Hanson
President
Center for Career and Community Research
402 Dayton Avenue #3, St. Paul, MN 55102-1772
maryann.hanson@ccc-research.com
Cheryl J. Paullin and Kenneth T. Bruskiewicz
Senior Research Scientist Senior Research Associate
Personnel Decisions Research Institutes
Leonard A. White
Senior Research Psychologist
U. S. Army Research Institute for the Behavioral and Social Sciences

INTRODUCTION
The Army Vocational Interest Career Examination (AVOICE) is a vocational interest
inventory developed as part of the U.S. Army’s Project A. It is designed to measure a wide
variety of interests relevant to Army jobs. AVOICE items describe a variety of occupational
titles, work tasks, leisure activities, and desired learning experiences and respondents are asked
to indicate their degree of interest in each. Data collected as part of Project A and later Building
the Career Force provide a great deal of information about the psychometric characteristics and
correlates of AVOICE scores, and more generally about how interests relate to important
military criteria. This paper describes the development of the AVOICE, Project A/Career Force
data collections that have included the AVOICE, and currently available analysis results. Finally,
we provide suggestions for further research and applications using the AVOICE in the
recruitment and classification process.

AVOICE DEVELOPMENT
The AVOICE is based on an interest inventory developed by the U.S. Air Force called
the Vocational Interest Career Examination (VOICE; Alley & Matthews, 1982). The VOICE
item pool was developed rationally to cover interest constructs that appeared relevant for Air
Force enlisted personnel. The items were grouped into scales based on content similarity and
data were used to improve the internal consistency of the scales. The VOICE was administered
as part of the first large-scale Project A predictor data collection (the Preliminary Battery). Based
on the results, some items were dropped, new items were added, and the response format was
changed from a three-point to a five-point scale. This initial version of the AVOICE was
administered during the field test of Project A predictor and criterion measures, revised based on
field test data, administered to the Concurrent Validation (CV) sample, refined further based on
these data, and finally administered to the Longitudinal Validation (LV) sample. Hough, Barge,
and Kamp (2001) provide more details concerning the AVOICE development. The current
version of the AVOICE contains 182 items grouped into 22 homogeneous basic scales. Based on
available literature, each AVOICE scale has been linked to one of the constructs in Holland’s
hexagonal model of interests (Realistic, Investigative, Artistic, Social, Enterprising, or

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
486

Conventional). Table 1 shows some example AVOICE scales and the associated Holland themes.
The AVOICE scales emphasize the Realistic theme, reflecting the fact that much of the work
performed by enlisted Army soldiers is Realistic in nature.

Table 1. AVOICE Composites, Holland Theme(s) and Examples of Scales Included


Composite Example Scale(s) Holland Theme(s)
Rugged/Outdoors Combat; Rugged Individualism Realistic
Audiovisual Arts Drafting; Audiographics Realistic; Artistic
Interpersonal Medical Services; Leadership Investigative; Social
Skilled/Technical Computers; Mathematics Investigative; Realistic
Administrative Clerical/Administrative Conventional
Food Service Food Service - Professional Conventional
Protective Services Fire Protection; Law Enforcement Realistic
Structural/Machines Mechanics; Vehicle Operator Realistic

PROJECT A/CAREER FORCE DATA COLLECTIONS AND CRITERIA


Project A/Career Force included a comprehensive set of criterion measures. There were
multiple job performance measures, including hands-on performance tests, written job
knowledge tests, supervisory role-play simulations (second-tour soldiers only), and self-report
measures of personnel actions (e.g., awards). In addition, job performance ratings were collected
from peers and supervisors using specially developed behaviorally-anchored rating scales.
Soldiers also completed a satisfaction questionnaire that assessed satisfaction with eight different
dimensions of the Army and their jobs, and also overall satisfaction. As with the AVOICE, the
criterion measures were continually refined and revised over the course of the project.
After the field test, Project A data collection efforts focused on two cohorts: the
Concurrent Validation (CV) cohort and the Longitudinal Validation (LV) cohort. The CV cohort
entered the military in 1983 or 1984, and were administered the Project A predictor and first-tour
criterion measures concurrently in 1985 during their first tour of duty. The LV cohort entered the
Army in 1986 or 1987. Each soldier in the LV cohort was administered the Project A predictor
measures during his or her first three days in the Army. During 1988 and 1989, the first-tour
performance measures were collected for these soldiers, and during 1990 and 1991 the second-
tour performance measures were collected. By this time, most soldiers in the LV cohort were in
their second tour of duty and had moved into leadership roles (e.g., squad leader).
Scores from all of the performance measures were used to model the structure of first-
and second-tour soldier job performance (Campbell & Knapp, 2001). These performance models
were then used to group scores on the performance measures into criterion composite scores. The
first-tour criterion composites used in the Project A basic validation analyses were: (1) Core
Technical Proficiency (CTP), (2) General Soldiering Proficiency (GSP), (3) Effort and
Leadership (ELS), (4) Maintaining Personal Discipline (MPD), and (5) Physical Fitness and
Military Bearing (PFD). The second-tour criterion composites are similar to the first, with the
primary difference being the addition of a sixth composite: Leadership (LDR). The ELS
composite was also revised and relabeled Achievement and Effort (AE) for second tour. Finally,
attrition data were collected from the Army archives. Attrition analyses in this paper focused on

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
487

soldiers who left the Army for avoidable reasons (e.g., failure to meet minimal performance or
behavioral criteria) during their first tour of duty. Because the majority of this avoidable attrition
occurs during the first 12 months, attrition was defined as leaving the military for avoidable
reasons during the first 12 months of service.

BASIC VALIDATION AND EMPIRICAL SCORING PROCEDURES


For the basic validation analyses, AVOICE item-level scores were summed to create the
22 interest scales. Based on principal components analysis, these scale scores were grouped into
eight summary composites (Campbell & Knapp, 2001; see Table 1). AVOICE validity was
assessed by computing multiple correlations between these eight AVOICE composites and the
performance criteria within each MOS, statistically adjusting the correlations for shrinkage
(Rozeboom, 1978), correcting them for range restriction, and then averaging across MOS.
Paullin, Bruskiewicz, Hanson, Logan, and Fellows (1995) conducted preliminary
analyses to determine whether empirical scoring procedures have potential for enhancing the
validity of the AVOICE beyond the levels obtained in the basic validation analyses. This work
focused on five MOS, selected to include both combat and noncombat MOS, some MOS that are
similar to each other and some that are quite different in terms of the tasks involved, and some
MOS that have a relatively large percentage of female soldiers. The MOS selected were
Infantryman (11B), Cannon Crewman (13B), Light Wheel Vehicle Mechanic (63B),
Administrative Specialist (71L), and Medical Specialist (91A). Only a subset of the criterion
variables were included in this work: Core Technical Proficiency (CTP) because it is quite well
predicted by the AVOICE (see results below); attrition because it is reasonable to expect that
soldiers whose interests do not match their jobs will be more likely to leave; and MOS
membership because it has been one of the most widely studied criteria in past research on
interests. Several different empirical keying procedures were explored for CTP for the 13Bs and
for occupational membership for the 91As. Based on results of these analyses, only the most
promising empirical keying strategies were evaluated for the remaining MOS and criteria.
Empirical keys that focus on response option level data do not assume a linear
relationship between item scores and the criterion of interest. Two of the most common response
option level approaches were tried: vertical percent and correlational. For the vertical percent
method, contrasting groups of 13Bs were formed based on the criterion variable (CTP): soldiers
whose CTP scores fell in the top 30 percent and those whose scores fell in the bottom 30 percent.
The differences between the percentages of soldiers choosing each response option in each
contrasting group were then used to assign “net weights” to each response option (Strong, 1943).
This essentially gives differences greater weight if they occur at either extreme of the response
distribution. Two vertical percent keys were developed: one including only those items that
showed at least a 5-point difference across groups and one including items that showed at least a
10-point difference. Similar procedures were followed for the MOS membership criterion for
91As, but all items were included in these keys. A second “correlational” method was also
explored for CTP. Each dichotomous response option score was correlated with the continuously
scored CTP criterion measure, and unit weights were assigned to response options with a
significant point-biserial correlation (p < .05).
Empirical scoring procedures focused on item-level data were also developed, by
computing correlations between AVOICE item-level scores and CTP. The items with the largest

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
488

correlations were then selected for a unit weighted short (6-item), medium (12-item), and long
key. Each successive key contained all of the items from the shorter keys, and the long key
included all of the items that had a statistically significant relationship with the criterion variable
(p < .05). Item-level empirical keys to predict MOS membership were developed using similar
procedures. Differences between the mean AVOICE item-level score for members of the target
MOS and members of all remaining MOS (i.e., the “general population”) were computed for
each item. Items for which this mean difference was significantly different from zero (p < .01)
were then included in a unit-weighted key, with a negative weight if the general population
received a higher score and a positive weight if the target MOS received a higher score.
All analyses for CTP were conducted within MOS, because interests are expected to have
different relationships with performance (i.e., CTP) in different jobs. For MOS that included
women, empirical keys were developed separately for men and women. Attrition analyses were
conducted for the 13Bs and 91A men only, and again analyses were conducted within MOS. All
samples were divided into two subsamples randomly. Empirical keys were then developed in one
sample and cross validated in the other. For the analyses focused on MOS membership,
empirical keys were developed in the CV sample and applied and evaluated in the LV sample.

RESULTS FOR JOB PERFORMANCE


A clear finding from the Project A research is that AVOICE interest scales predict job
performance. For example, in the LV sample (N = 4,220), the mean multiple correlation, across
nine MOS, between AVOICE composites and CTP was .38 (corrected for range restriction and
adjusted for shrinkage). Multiple correlations for the other first-tour performance composites
were .37 with GSP, .17 with ELS, .05 with MPD, and .05 with PFB. AVOICE validities for
second-tour performance (N=1,217) were similar; the multiple correlations were .41 with CTP,
.29 with GSP, .09 with AE, .06 with MPD, .09 with PFB, and .35 with LDR. It is interesting that
the prediction of CTP is slightly better for second-tour soldiers, while the prediction of GSP is
somewhat worse. Also, AVOICE scores predicted leadership performance quite well, even
though there was not a strong a priori rationale for expecting this relationship.

Table 2. Comparison of the Validity of AVOICE Composites with Empirical Keys for First-
Tour Core Technical Proficiency (CTP)
Cross-Validation Multiple Empirical Keys
1
Sample Size Correlation
12-Item All-Sig.
CV Median across MOS 176 .20 .24 .25
CV Range across MOS 92-243 .00-.44 .14-.43 .16-.46
LV Median across MOS 240 .15 .24 .23
LV Range across MOS 113-365 .00-.17 .17-.28 .17-.25
1
Multiple correlation of eight rational composites adjusted for shrinkage using Rozeboom (1978).

Regarding empirical keys for CTP, the various item and response option level empirical
keying procedures did not yield appreciably different cross-validated results. Response option
level keys general yielded higher validities in the development sample, but greater shrinkage in

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
489

the cross-validation sample. Since conclusions based on the various empirical keying approaches
are virtually identical, this paper will present results for the item-level keys only. The short (i.e.,
6-item) scales were uniformly less valid than the medium and long scales, and results are
presented here for the latter scales only. Table 2 summarizes the results across six groups – 11Bs,
13Bs, 63B men, 71L men, 71L women, and 91A men – for the CV and LV cohorts. In general,
empirically developed AVOICE scales provided somewhat better prediction of CTP than the
multiple correlations based on the eight rationally developed composites.

RESULTS FOR SATISFACTION, ATTRITION AND MOS MEMBERSHIP


Regarding job satisfaction, Carter (1991) examined multiple correlations between the
AVOICE composites and satisfaction scales (adjusted for shrinkage). Sample size weighted
means (across MOS) ranged from .04 to .13 across the different dimensions of satisfaction, with
a median of .10. These correlations are lower than those typically reported for military samples
(e.g., Hough et al., 2001), but correlations between interests and satisfaction in the non-military
literature have been inconsistent and often quite low (e.g., Dawis, 1991). The AVOICE did not
predict attrition well for the 13B and 91A MOS. The median multiple correlation between the
eight composites and attrition (adjusted for shrinkage) was only .06. Empirical keying did not
improve these results. In fact, the median cross-validity for the empirical attrition keys was only
.02. AVOICE scores are, however, substantially related to soldiers’ occupational (i.e., MOS)
membership. Table 3 summarizes the results across seven groups: 11Bs, 13Bs, 63B men, 71L
men, 71L women, 91A men, and 91A women. Soldiers generally score considerably higher on
the occupational key developed for their own MOS than do soldiers from other MOS, with a
median effect size of more than a full standard deviation.

Table 3. Mean 12-Item Occupational Scale Scores in the Longitudinal Validation Sample
Target MOS General Population1
N Mean SD N Mean SD Effect
Size2
Median 486 61.24 8.45 5,889 49.96 9.43 1.24
Across MOS
Minimum 74 53.88 6.22 529 48.24 8.81 .40
Maximum 764 65.24 10.60 6,545 56.29 11.52 1.69
1
General population for each scale consists of members of all other Batch A and Batch Z MOS in that sample.
2
Effect sizes in this table are relative to the target MOS (i.e., a positive effect size indicates that the target MOS
score higher than the general population). All effect sizes are significant at the p < .01 level.

DISCUSSION AND RECOMMENDATIONS


AVOICE scores predict the MOS-specific aspects of performance quite well, and they
are also related to MOS membership. Interests are not strongly related to attrition, so there is
apparently a good deal of self-selection into the occupational specialties that match soldiers’
interests. The fact that interests are related to performance even within this self-selected group
suggests that the Army could benefit from more systematically placing recruits in MOS that
match their interests. For example, recruits could simply be told which MOS best match their

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
490

interests and the benefits of entering an MOS that more closely matches their interests. Because
people tend to gravitate toward occupations that are consistent with their vocational interests, the
AVOICE may also provide a good recruiting tool, as it could be used to identify the MOS that
are likely to be more appealing to potential recruits. Alternatively, AVOICE scores could be used
to make classification decisions. Further analyses are needed to determine the extent to which
AVOICE scores can contribute to performance prediction beyond the current ASVAB
composites.
Empirical keying has good potential for improving the validity of the AVOICE. It is
worth noting that the analyses presented here actually provide a fairly conservative test of the
validity of empirical approaches, as the multiple correlation using the eight relatively
homogenous AVOICE composites also capitalizes on the relevant predictor-criterion
relationships. Empirical keys are likely to provide even more of an advantage when compared
with purely rational keys. Large datasets that include interest scores and the criteria of interest
are needed for empirical keying, and the Project A/Career Force data provides a valuable source
of this relatively difficult to obtain item-level validity information. Empirical keying has been
explored for only a subset of criteria and MOS, and based on the results further research is
warranted.

REFERENCES
Alley, W.E., & Matthews, M.D. (1982). The vocational interest career examination: A descrip-
tion of the instrument and possible applications. Journal of Psychology, 112, 169-193.
Campbell, J.C., & Knapp, D.J. (Eds, 2001) Exploring the limits in personnel selection and
classification. Mahwah, NJ: Lawrence Erlbaum Associates.
Carter, G.W. (1991). A study of relationships between measures of individual differences and job
satisfaction among U.S. Army personnel. Unpublished doctoral dissertation, University of
Minnesota, Minneapolis.
Dawis, R. V. (1991). Vocational interests, values, and preferences. In M. D. Dunnette and L. M.
Hough (Eds.), Handbook of Industrial and Organizational Psychology (2nd ed., vol. 2, pp.
833-871). Palo Alto, CA: Consulting Psychologists Press.
Hough, L., Barge, B., & Kamp, J. (2001). Assessment of personality, temperament, vocational
interests, and work outcome preferences. In J. P. Campbell & D. J. Knapp (Eds.),
Exploring the limits in personnel selection and classification. Mahwah, NJ: Lawrence
Erlbaum & Associates.
Paullin, C., Bruskiewicz, K. T., Hanson, M. A., Logan, K., & Fellows, M. (1995). Development
and evaluation of AVOICE empirical keys, scales and composites. In J. P. Campbell &
L. M. Zook (Eds.), Building and retaining the career force: New procedures for
accessing and assigning Army enlisted personnel - Final report (ARI Technical Report).
Alexandria, VA: U. S. Army Research Institute for the Behavioral and Social Sciences.
Rozeboom, W.W. (1978). Estimation of cross-validated multiple correlation: A clarification.
Psychological Bulletin, 85, 1348-1351.
Strong, E.K. (1943). Vocational interests of men and women. Stanford, CA: Stanford University
Press

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
491

DEVELOPING MEASURES OF OCCUPATIONAL INTERESTS AND


VALUES FOR SELECTION39
Dan J. Putka, Ph.D. Chad H. Van Iddekinge, Ph.D.
Human Resources Research Organization Human Resources Research Organization
66 Canal Center Plaza, Suite 400 66 Canal Center Plaza, Suite 400
Alexandria, Va 22314 Alexandria, Va 22314
Dputka@Humrro.Org Cvaniddekinge@Hummro.Org

Christopher E. Sager, Ph.D.


Human Resources Research Organization
66 Canal Center Plaza, Suite 400
Alexandria, Va 22314
csager@humrro.org

INTRODUCTION

Historically, personnel selection has concerned the development of predictor measures


that assess knowledges, skills, and attributes (KSAs) deemed critical to successful job
performance (e.g., Campbell, 1990; Schmitt & Chan, 1998). However, job performance is not the
only criterion that the U. S. Army desires to affect through its selection and classification
systems. Most notably, attrition is often a key criterion of interest. Unfortunately, traditional
KSA-based predictor development strategies fall short in identifying predictors of non-
performance criteria. Fortunately, the academic literature provides alternative strategies for
developing predictors that are well grounded in theory and highly relevant for the prediction of
alternative criteria. For example, measures of occupational interests and values have a long
tradition in vocational/career counseling literature, and have been found to be predictive of both
employee turnover and several work-related attitudes that are believed to underlie turnover (e.g.,
job satisfaction and commitment; Dawis, 1991). Although common in the vocational counseling
arena, several challenges arise when such measures are considered for use in personnel selection
where intentional response distortion among respondents becomes more likely. This paper
discusses challenges that have arisen in our efforts to develop such measures for the Army’s
Select21 project.

INTEREST AND VALUE MEASURES IN SELECT21

The goal of the Select21 project (sponsored by the U.S. Army Research Institute for the
Behavioral and Social Sciences) is to develop and validate measures that will help the Army select
and retain Soldiers with the characteristics needed to succeed in the future Army. A key element of
predictor development has been to develop measures of person-environment (P-E) fit (Kristof,

39
This paper is part of a symposium titled Occupational Interest Measurement: Where Are the Services Headed?
presented at the 2003 International Military Testing Association Conference in Pensacola, FL (M.G. Rumsey,
Chair). The views, opinions, and/or findings contained in this paper are those of the authors and should not be
construed as an official U.S. Department of the Army position, policy, or decision.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
492

1996), with particular focus on the match between prospective Soldiers’ work interests and values
and those supplied by the Army environment, both now and in the future. Our strategy for
developing these measures for Select21 arose out of work done in two relatively distinct bodies
of literature, which we briefly summarize in the sections that follow.

Theory of Work Adjustment

Our first strategy for developing fit measures was derived from the Dawis, England, and
Lofquist (1964) Theory of Work Adjustment (TWA). Although the TWA is a broad theory, we
focused on the part that concerns the correspondence between individuals’ needs (in this case
recruits) and what the organization or job (in this case the Army) supplies. Specifically, TWA
suggests that a Soldier’s level of work satisfaction is a function of the correspondence between
that Soldier’s preference for various occupational reinforcers and the degree to which the Army
(or job) provides those reinforcers. An occupational reinforcer is a characteristic of the work
environment associated with an individual’s work values (e.g., having a chance to work
independently, being paid well, having good relationships with co-workers). Within the P-E fit
literature, correspondence between a Soldier’s needs and what the organization or job supplies in
terms of those needs is referred to as “needs-supplies fit” (Edwards, 1991).

Although TWA focuses on occupational reinforcers and work values, other theories have
postulated similar relationships between needs-supplies fit and work-related attitudes with
different needs-supplies content. For example, Holland’s congruence theory (Holland, 1985)
would suggest that a Soldier’s work satisfaction is, in part, a function of the congruence between
that Soldier’s vocational interests (i.e., standing on RIASEC40 interest dimensions) and the
interests supported by the Army/job environment. Vocational interests are often indicated by
individuals’ preferences for various generalized work activities, work contexts, and leisure
activities. For Select21, we drew on both TWA work value and RIASEC interest content when
developing the interest and value measures.

Realistic Job Previews

The other strategy we adopted for developing predictor measures was derived from the
literature on realistic job previews (RJPs; e.g., Wanous, 1992). RJPs are hypothesized to bring
applicants’ pre-entry expectations more in line with reality, thus serving to reduce later negative
effects (e.g., dissatisfaction and turnover) of unmet expectations (e.g., Hom, Griffeth, Palich, &
Bracker, 1999). RJPs are not typically thought of as predictors in the selection context; rather
they reflect information provided to an applicant. As such, traditional pre-entry RJPs take the
selection decision out of the hands of the organization and put it into the hands of the applicant
(i.e., self-selection). In a lean recruiting environment, loss of such power on the Army’s part
would be undesirable. Despite their value, this characteristic of traditional RJPs might explain
their lack of use in the Army (Brose, 1999).

40
Holland discusses six dimensions of vocational interest: realistic, investigative, artistic, social, enterprising, and
conventional (RIASEC; Holland, 1985).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
493

For Select21 we conceived of a novel way to capitalize on the benefits of pre-entry RJPs,
yet put the decision in the hands of the Army. We are seeking to achieve this by presenting RJP
information in the form of a pre-entry knowledge of the Army test. For example, we developed
measures that ask prospective Soldiers the extent to which they believe the occupational
reinforcers and interests assessed in the Select21 needs-supplies fit measures are characteristic of
the Army. We refer to correspondence between recruits’ expectations and the reality of Army
life as “expectations-reality fit.” Thus, based on content from the needs-supplies fit measures, we
also constructed expectations-reality fit measures for Select21.

Our decision to go beyond traditional needs-supplies fit measures, and include


expectations-reality fit measures for Select21 stems from our belief that these two types of
measures will interact to predict the criteria of interest (i.e., attrition and its attitudinal
precursors). For example, based on expectancy theory (Vroom, 1964), we expect that misfit
between the applicant and Army for any given occupational reinforcer (e.g., degree of autonomy)
depends on (a) how important the reinforcer is to the applicant (need), (b) how much the
applicant expects the Army to provide the reinforcer (expectation), and (c) the extent to which
the Army actually offers the reinforcer (supply).

For example, consider two applicants—one who values autonomy and expects the Army
will provide it, and a second who values autonomy, but does not expect the Army to provide it. If
we assume the Army does not provide autonomy, it is likely that the second applicant will be
more satisfied with the Army than the first. Although both applicants value autonomy (indicating
a needs-supplies misfit), the fact that the first applicant expects autonomy and does not receive it
may result in greater dissatisfaction for the first applicant. Thus, a hypothesis we plan to test is
that Soldiers will be more dissatisfied and more likely to attrit if they have unmet needs
regarding interests and values that they expected the Army to support.

GENERAL DEVELOPMENT CONSIDERATIONS

The previous section summarized two types of measures we developed for Select21, and
theoretical bases for them. In the remaining sections, we elaborate on several issues we
considered during the course of their development.

WHAT TO ASSESS?

A key characteristic that differentiates the strategies for developing needs-supplies and
expectations-reality fit measures from the development of traditional KSA-based predictor
measures is the determination of what constructs to assess. As noted earlier, selection measures
are generally designed to assess critical KSAs identified by a job analysis. When developing
needs-supplies and expectations-reality fit measures, however, this approach makes little sense
because of the need to understand the range of an applicant’s needs or expectations, particularly
those needs that the organization or job environment cannot satisfy (which can lead to
dissatisfaction). Thus, the constructs that are critical to assess may vary by applicant (e.g., it

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
494

depends on what values or interests the individual applicant finds most desirable) instead of
being a fixed entity (as with critical KSAs).

It is therefore important to work from a broad taxonomy when developing needs-supplies


and expectations-reality fit measures to ensure adequate coverage of the values and interests that
the applicant population might possess. Given the prominence and breadth of the Dawis and
Lofquist (1984) taxonomy of occupational reinforcers and Holland’s RIASEC dimensions, we
adopted these frameworks as a basis for developing the fit measures. We also took steps to
expand the taxonomy of occupational reinforcers in light of our review of recent work on (a)
general work values (e.g., Schwartz, 1994), (b) values of American youth (Sackett & Mavor,
2002), and (c) values of Army recruits (Ramsberger, Wetzel, Sipes, & Tiggle, 1999), to ensure
the resulting set of occupational reinforcers is comprehensive.

RECONCILING DIFFERENT TYPES OF FIT

During the development process, we recognized a distinction should be made between


two types of fit—fit with the Army and fit with one’s military occupational specialty (MOS).
Person-Army fit refers to the correspondence between Soldiers’ KSAs, values, interests, and
expectations and those required/supported by the Army in general. Person-MOS fit refers to the
correspondence between Soldiers’ KSAs, values, interests, and expectations and those that are
required/supported by the specific MOS to which the Soldier is assigned.

We geared the Select21 occupational reinforcer-based measures towards assessing


Person-Army fit. This is because the occupational reinforcers we used reflect Army-wide
conditions, and are not specific to individual MOS. For example, the Army’s supply of the
occupational reinforcer relating to pay is fairly stable across MOS within pay grade. For the
RIASEC-based measures, elements of both Army-wide and MOS-specific fit are considered.
This is because interests are often tied to job tasks and opportunities for training that vary by job.
As such, there likely exists an Army-wide RIASEC profile that taps the common tasks and
training opportunities offered by Army jobs in general, as well as an MOS-specific RIASEC
profile that taps the MOS-specific tasks and training opportunities.

An issue we will confront when validating the RIASEC-based fit measures is how Army-
wide and MOS-specific fit will interact to predict the criteria of interest. For example, a Soldier’s
interests may match the profile of his/her MOS, but differ form the Army-wide profile. Empirical
examination of the interaction between person-organization and person-job fit has only recently
begun to appear in the civilian literature (Kristof-Brown, Jansen, & Colbert, 2002). As such,
when conducting validation analyses, care will be taken to explore the unique and joint
contribution of these types of fit.

RESPONSE DISTORTION

Response distortion becomes a prominent issue when attempting to use needs-supplies fit
measures in an operational selection context (Hough, 1998; Rosse, Stecher, Miller, & Levin,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
495

1998). For example, nearly all of the occupational reinforcers discussed in the TWA are socially
desirable. Thus, a Likert-type work values measure is not likely to yield useful information. On
the other hand, not all of the work activities, work contexts, leisure activities, or learning
experiences used in vocational interest inventories are socially desirable. These differences
indicate that the best methods for managing response distortion on these measures may differ
depending on whether one is assessing work values or vocational interests.

Work Values
One promising way to deal with content that is socially desirable is to use a forced-choice
format. For Select21, we constructed a novel forced-choice occupational reinforcer-based needs
measure to assess work values. The purpose of using a forced-choice format was to reduce its
susceptibility to response distortion in the operational selection context (Jackson, Wrobleski, &
Ashton, 2000).

However, the forced-choice format is not without its problems. For example, forced-
choice measures result in ipsative or partially ipsative data (Hicks, 1970). Ipsative data indicate
whether an individual is higher on one construct than another (e.g., whether a prospective Soldier
has a greater need for work autonomy than for a supportive supervisor). However, the selection
context needs normative data that compare individuals to each other on a single construct (e.g.,
orders prospective Soldiers on their need for work autonomy). We are taking several steps to
help maximize the ability to normatively scale recruits based on their responses to our forced-
choice measure (see Van Iddekinge, Putka, & Sager, 2003). Another potential problem that may
arise from using a forced-choice format to assess work values is that one value in a pair sounds
more like the Army than the other. In such cases, an applicant desiring to be selected into the
Army may endorse that statement regardless of whether they value it. We took steps to construct
the forced choice work values measure to make such impression management tactics more
difficult (see Van Iddekinge et al., 2003).

Interests

Unlike measures of work values, assessing applicants’ vocational interests with a forced-
choice measure may be less feasible. For example, on many interest measures, items relating to
military-type activities (e.g., I like to shoot guns) are included as indicators of realistic interests.
Inclusion of such items is problematic in a selection context. That is, an applicant who strongly
desires to get into the Army and is willing to distort his/her responses to do so will indicate a
strong liking for such items regardless of whether it interests him/her. Given the specific nature
of common interest items, imposing a forced-choice format by pairing these items with other
interest items would not likely resolve this type of response distortion.

Another factor that limits the potential benefit of a forced-choice interest measure is the
number of items often used to measure occupational interests. For example, a diverse array of
work activities from different occupations may be required to accurately measure investigative
interests. One drawback of this is that it makes it more difficult to use a forced-choice format

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
496

because there are too many indicators of interests to compare, making for an overly long
measure. Thus, the key to managing response distortion on interest inventories may be to balance
item content. For example, this might be achieved by writing items for each RIASEC dimension
that are either (a) all descriptive of the Army or (b) all Army neutral (e.g., don’t sound like the
Army). Conversely, one interesting possibility would be to include both types of items on an
interest inventory to examine differences in RIASEC profiles based on items that are intended to
sound like the Army versus items that are Army neutral. If the RIASEC profile based on Army-like
items is higher across all dimensions, it might indicate that individuals are distorting their
responses.

SCORING CONSIDERATIONS

In assessing the fit between what the applicant needs (or expects) and what the
organization or job supplies, commensurate measures are typically administered to applicants
and incumbent subject matter experts (SMEs). The role of SMEs is to provide an organization-or
job-side profile against which the “fit” of an applicant’s profile can be assessed. The degree of fit
is typically indexed using profile similarity indexes (PSIs; Edwards, 1991). Past research in the
TWA literature suggests that the PSI most correlated with work satisfaction is a simple rank-
order correlation of profiles (e.g., Rounds, Dawis, & Lofquist, 1987). Nevertheless, such PSIs
have been criticized because of their conceptual problems. Most notably, they mask the
relationship between individual dimensions of fit (e.g., Artistic interests) and the criteria. Past P-
E fit research indicates that individual dimensions of fit may have different relationships with the
same criteria, and that aggregating them into profile similarity indices makes identification of
(and capitalization on) differential relationships problematic (Edwards, 1994).

A potential alternative to using a profile similarity index to score fit measures is


polynomial regression (Edwards, 1994). This approach involves modeling the criterion variable
of interest as a function of applicants’ need and a job’s supply of any given dimension of fit (e.g.,
Artistic interests). Unfortunately, this method is most applicable in studies of person-job fit
where participants hold a variety of jobs (thus allowing variation in supply measures on the job-
side of the equation). However, one can still capitalize on certain aspects of this approach when
assessing P-E fit within a single organization. For example, in Select21 a regression-based
method would allow us to model the criteria of interest as a function of (a) Soldiers’ level of
preference for a reinforcer/interest, (b) an indicator of where Soldiers’ level of preference falls
relative to what the Army provides (e.g., above/below), and (c) hypothesized moderating
variables (e.g., Soldiers’ expectation regarding the given reinforcer/interest). Relying on
aggregate profile similarity indices, as indicators of fit would not support this type of modeling,
which in turn could lead to lower criterion-related validity estimates for the fit measures
(Edwards, 1994).

SUMMARY

In this paper we discussed a number of issues to consider when developing measures of


values and interests for use in personnel selection. Although many challenges are present, the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
497

potential benefit of including such measures in selection batteries may be substantial.


Specifically, based on theory, such measures are more likely to predict valued alternative criteria
such as attrition and its attitudinal precursors than traditional KSA-based selection instruments.
Therefore, we feel that making efforts to construct measures of values and interests for use in
personnel selection is a worthy pursuit.

REFERENCES

Brose, G. D. (1999). Could realistic job previews reduce first-tour attrition? Unpublished
master’s thesis. Monterey, CA: Naval Postgraduate School.

Campbell, J. P. (1990). Modeling the performance prediction problem in industrial


organizational psychology. In M.D. Dunnette & L.M. Hough (Eds.), Handbook of
industrial and organizational psychology (2nd ed., Vol.1, pp. 687-732). Palo Alto:
Consulting Psychologists Press.

Dawis, R. V. (1991). Vocational interests, values, and preferences. In M.D. Dunnette & L. M.
Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol. 2,
pp. 833-871). Palo Alto: Consulting Psychologists Press.

Dawis, R. V., England, G. W., & Lofquist, L. H. (1964). A theory of work adjustment. Minnesota
Studies in Vocational Rehabilitation, XV. Minneapolis: University of Minnesota.

Dawis, R. V., & Lofquist, L. H. (1984). A psychological theory of work adjustment.


Minneapolis: University of Minnesota Press.

Edwards, J. R. (1991). Person-job fit: A conceptual integration, literature review and


methodological critique. International review of industrial/organizational psychology
(Vol. 6, pp. 283-357). London: Wiley.

Edwards, J. R. (1994). The study of congruence in organizational behavior research: Critique and
proposed alternative. Organizational Behavior and Human Decision Processes, 58, 51-100.

Hicks, L. E. (1970). Some properties of ipsative, normative, and forced-choice normative


measures. Psychological Bulletin, 74, 167-184.

Holland, J. L. (1985). Manual for self-directed search. Odessa, FL: Psychological Assessment
Resources.

Hom, P. W., Griffeth, R. W., Palich, L. E., & Bracker, J. S. (1999). Revisiting met expectations
as a reason why realistic job previews work. Personnel Psychology, 52, 97-112.

Hough, L. M. (1998). Effects of intentional distortion in personality measurement and evaluation


of suggested palliatives. Human Performance, 11, 209-244.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
498

Jackson, D. N., Wrobleski, V. R., & Ashton, M. C. (2000). The impact of faking on employment
tests: Does forced-choice offer a solution? Human Performance, 13, 371-388.

Kristof, A.L. (1996) Person-organization fit: An integrative review of its conceptualizations,


measurements, and implications. Personnel Psychology, 49, 1-49.

Kristof-Brown, A.L., Jansen, K.J., & Colbert, A. (2002). A policy-capturing study of the
simultaneous effects of fit with jobs, groups, and organizations. Journal of Applied
Psychology, 87(5), 985-993.

Ramsberger, P. F., Wetzel, E. S., Sipes, D. E., & Tiggle, R. B. (1999). An assessment of the
values of new recruits (FR-WATSD-99-16). Alexandria, VA: Human Resources
Research Organization.

Rosse, J.G., Stecher, M.D., Miller, J.L., & Levin, R. (1998). The impact of response distortion on
pre-employment personality testing and hiring decisions. Journal of Applied Psychology,
83, 634-644.

Rounds, J.B., Dawis, R.V., & Lofquist, L.H. (1987). Measurement of person-environment fit and
prediction of satisfaction in the theory of work adjustment. Journal of Vocational
Behavior, 31, 297-318.

Sackett, P. R. & Mavor, A. (Eds.) (2002). Attitudes, aptitudes, and aspirations of American youth:
Implications for military recruitment. Washington, D.C.: National Academies Press.

Schmitt, N., & Chan, D. (1998). Personnel selection: A theoretical approach. Thousand Oaks,
CA: Sage Publications.

Schwartz, S. H. (1994). Are there universal aspects in the structure and contents of human
values? Journal of Social Issues, 50, 19-45.

Van Iddekinge, C. H., Putka, D. J., & Sager, C. E. (2003, November). Assessing Person-
Environment (P-E) Fit with the Future Army. In D. J. Knapp (Chair), Selecting Soldiers
for the Future Force: The Army’s Select21 Project. Paper presented at the 2003
International Military Testing Association Conference, Pensacola, FL

Vroom, V. (1964). Work and motivation. New York: John Wiley.

Wanous, J. P. (1992). Organizational entry (2nd Ed.), Reading, MA: Addison-Wesley

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
499

OCCUPATIONAL SURVEY SUPPORT OF


AIR AND SPACE EXPEDITIONARY FORCE (AEF) REQUIREMENTS
Mr. Robert E. Boerstler, Civilian, Chief, Leadership Development Section
Mr. John L. Kammrath, Civilian, Chief, Occupational Analysis Flight
Air Force Occupational Measurement Squadron, Randolph AFB, TX 78150-4449
robert.boerstler@randolph.af.mil john.kammrath@randolph.af.mil

ABSTRACT

This paper highlights the Air Force Occupational Measurement Squadron’s (AFOMS)
involvement in the Air Education and Training Command (AETC) initiative to determine 3-skill-
level deployment requirements in support of the Air and Space Expeditionary Forces (AEFs). In
order to provide current AF leadership with a strategy for conducting initial skills training
focused on deployment tasks, AFOMS was challenged to devise a means of surveying members
currently deployed and members who have returned from deployments within the past
12 months. Ten sample specialties with known deployment requirements were selected to
participate in this survey. Results of this effort provided insight into the importance of targeted
task training in initial skills training courses. The potential for AETC to change from garrison-
based initial skills training to deployment task training will require a paradigm shift for many of
the US Air Force functional communities to “train as they fight” in support of the AEF structure.

INTRODUCTION

AEFs were invented in the 1990s to solve chronic deployment problems. More than anything
else, the Air Force hoped to provide a measure of stability and predictability for its airmen, who
were constantly being dispatched overseas on one short-notice contingency assignment after
another. It was not apparent at the time what a big difference this change was going to make.
The AEFs have become a new way of life for the Air Force.

Airmen are still assigned to their regular units at their home stations. But most likely they also
belong to an AEF, and for 3 months out of every 15, the AEF governs where they will be and
what they will do. About half of the airmen and officers in the active duty force are already in an
AEF, and the number is rising. Guard and Reserve participation is so high that a fourth of the
deployed forces come from the Air Reserve Components.

The Air Force has grouped its power projection forces and the forces that support them into 10
"buckets of capability," each called an AEF. (The other abbreviation, "EAF"--for Expeditionary
Air and Space Force--refers to the concept and organization.)

Secretary of the Air Force James G. Roche told Congress in February that "a nominal AEF has
about 12,600 people supporting 90 multirole combat aircraft, 31 intratheater airlift and air
refueling aircraft, and 13 critical enablers. The enablers provide command, control,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
500

communications, intelligence, surveillance, and reconnaissance, as well as combat search and


rescue."

Increasingly, the Air Force describes itself operationally in terms of AEFs rather than wings or
wing equivalents. A full AEF rotation cycle is 15 months. It is divided into five 3-month
periods, and during each of these, two of the AEFs are vulnerable to deployment.

In August 2002, Chief of Staff of the Air Force (CSAF), General John P. Jumper, issued a Sight
Picture entitled “The Culture of our Air and Space Expeditionary Force and the Value of Air
Force Doctrine.” General Jumper’s comments included: “Concerning what I call “The Culture
of the Air and Space Expeditionary Force,” everyone in the Air Force must understand that the
day-to-day operation of the Air Force is absolutely set to the rhythm of the deploying AEF force
packages. Essential to this cultural change is our universal understanding that the natural state
of our Air Force when we are “doing business” is not home station operations but deployed
operations. The AEF cycle is designed to provide a rhythm for the entire business of our Air
Force, from assignment cycles to training cycles and leave cycles. That process needs to be the
focus of our daily operational business. We must particularly work to change processes within
our own Air Force that reach in and drive requirements not tuned to the deployment rhythm of
the AEF. That means that when the 90-day vulnerability window begins, the people in that
particular AEF force package are trained, packed, administered, and are either deploying or
sitting by the phone expecting to be deployed. There should be no surprises when that phone
does ring, and no reclamas that they are not ready. More important, there should be no reclamas
because someone other than the AEF Center tasked people in the AEF for non-AEF duties.”

Operational commanders at all levels have found it difficult to maintain enough qualified airmen
to meet personnel deployment demands for the Unit Type Code (UTC) requirements that have
been levied on their units.

This problem was elevated to HQ AETC, and the Director of Operations (DO) hosted a
conference with Air Force Career Field Managers (AFCFMs) of a selected sample of Air Force
specialties to determine if the apprentice (or 3-skill level) airmen could be task-certified to meet
some deployment requirements identified for journeyman or 5-skill-level requirements.

In preparation for this conference, HQ AETC/DO requested AFOMS assistance in developing a


survey to determine if 3-skill-level personnel could be used for some AEF UTC requirements.

AFOMS is responsible for conducting occupational analyses for every enlisted career field
within the Air Force and for selected officer utilization fields. AFOMS is an operational
scientific organization that is often in contact the senior enlisted and officer career field
managers through Utilization and Training Workshops (U&TWs). Occupational surveys
generally provide information in terms of the percentage of members performing jobs or tasks,
the relative percentage of time spent performing tasks, equipment used, task difficulty, training
emphasis, testing importance (for enlisted specialties only), and the skills necessary to perform

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
501

tasks. The structure of jobs within an occupation should serve as a guide for refining the Air
Force military classification system or personnel utilization policies for that occupation.

With these capabilities, AFOMS was engaged to assist in providing empirical data that could be
used to identify what tasks are performed in a deployed environment.

METHOD

Survey Development Process


An occupational survey begins with a job inventory (JI) -- a list of all the tasks performed by
members of a given Air Force Specialty Code (AFSC) as part of their actual career field work
(that is, additional duties and the like are not included). We include every function that career
field members perform by working with technical training personnel and operational subject-
matter experts (SMEs) to produce a task list that is complete and understandable to the typical
job incumbent. The SMEs write each task to the same level of specificity across duty areas, and
no task is duplicated in the task list. The JIs used for this project were the most current for each
AFSC as compiled through our ongoing 3-year cyclical survey process.

Survey Administration
This survey was administered to 3-, 5-, and 7-skill-level personnel who are either currently
deployed or have been deployed within the past 12 months in support of contingency operations.
A list of personnel who met these requirements was provided by the Air Force Personnel Center
(AFPC). A web-based survey was developed which included the Job Inventory for each of the
12 AFSCs selected. As the individual responded to certain background questions, they were
branched within the survey to their appropriate JI section.

All 3- and 5-skill-level personnel were branched to their AFSC's JI and instructed to mark only
those tasks they performed while deployed to support a contingency operation. Once they
completed marking all tasks they performed while deployed, they were presented with only those
tasks marked and asked to rate each task on a scale of 1-9 on how well-trained or prepared they
felt they were to perform the task upon arrival at the deployed location. The following rating
scale was used for this data collection:
1 Extremely low preparation/training
2 Very low preparation/training
3 Low preparation/training
4 Below average preparation/training
5 Average preparation/training
6 Above average preparation/training
7 High preparation/training
8 Very high preparation/training
9 Extremely high preparation/training

All 7-skill-level personnel were also branched to their AFSC's JI and instructed to mark those
tasks they felt were important for personnel they supervised to be able to perform upon arrival at

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
502

a deployed location to support a contingency operation. Once they completed marking all tasks
they felt were important to be performed while deployed, they were presented with only those
tasks and asked to rate each task on a scale of 1-9 on how import each task was to perform by 3-
and 5- skill level personnel upon arrival at the deployed location. The following rating scale was
used for this data collection:
1 Extremely low importance
2 Very low importance
3 Low importance
4 Below average importance
5 Average importance
6 Above average importance
7 High importance
8 Very high importance
9 Extremely high importance

The data collected were then formatted and loaded into the Comprehensive Occupational Data
Analysis Programs (CODAP) and sorted on various groups to enable analysis and display of the
data by various skill level groups. The JI tasks were matched to the current Specialty Training
Standard (STS) for each AFSC to depict the tasks currently coded in the STS as core tasks.

The definition for a core task varies among CFMs, but for the purpose of this analysis, we
wanted to display the feasibility of defining a core task as “a task that is core to the specialty and
performed in a deployed environment.” By employing this definition, tasks can be easily
identified for initial skills training and certified prior to deployment.

Results
The data were presented to the CFMs at the HQ AETC/DO conference and depicted the top tasks
performed by each sample AFSC while deployed. A sample of these data tables are presented on
the following page, sorted by Supervisor’s Importance rating for each. The percent members
performing by 3- and 5-skill-level airmen are also presented for each task.

This information was very well-received by both the functional and training communities to help
identify deployment requirements. AFOMS is now incorporating these deployment survey
techniques into our cyclical analysis process to provide deployment data for every AFSC.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
503

AFSC 2A3X1 (A-10, F-15, U-2 AVIONIC SYSTEMS)


TASKS WITH HIGHEST SUPERVISORY IMPORTANCE EMPHASIS RATINGS
PERCENT MEMBERS
PERFORMING

All All SUPV


CORE 5-Lvl 3-Lvl IMPORT
TASKS TASK (N=44) (N=47) (N=35)

A0038 Troubleshoot aircraft wiring * 86 85 6.51


D0287 Code mode 4 crypto equipment 50 43 6.26
D0288 Code secure voice crypto equipment 39 36 6.20
A0031 Repair aircraft wiring * 77 70 6.14
F0451 Close CAMS maintenance events 86 72 6.09
A0037 Trace wiring, system, or interface diagrams * 80 74 6.09
SI MEAN = 2.07; S.D. = 1.62; HIGH SI = >3.69

AFSC 3C0X1 (COMM/COMPUTER SYSTEMS)


TASKS WITH HIGHEST SUPERVISORY IMPORTANCE EMPHASIS RATINGS
PERCENT MEMBERS
PERFORMING

All All SUPV


CORE 5-Lvl 3-Lvl IMPORT
TASKS TASK (N=120) (N=38) (N=172)

A0024 Install computer hardware for end users 73 55 6.14


A0009 Assist users in resolving computer software malfunctions or problems 78 79 5.98
A0025 Install standalone and network computer operating systems, such as Windows or UNIX 73 61 5.78
A0023 Install application software, such as information protection or special systems software 76 61 5.50
B0042 Answer trouble calls from end users dealing with network outages 74 66 5.43
A0012 Configure operating systems, such as UNIX or NT Server * 70 55 5.35
SI MEAN = 1.03; S.D. = 1.17; HIGH SI = >2.20

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
504

CONCLUSION

The US Air Force is focused on AEF as its normal operation. AFOMS was enlisted to assist in
identifying the task requirements through occupational analysis and stands poised to develop a
comprehensive support base for future decisions.

REFERENCES

Air Force Magazine (July 2002), Vol. 85, No. 07, John T. Correll, The EAF in Peace and War

Chief’s Sight Picture (August 2002), General John P. Jumper, The Culture of our Air and Space
Expeditionary Force and the Value of Air Force Doctrine

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
505

Occupational Analytics

Paul L. Jones
Navy Manpower Analysis Center

Jill Strange and Holly Osburn


SkillsNET Corporation

The U.S. Navy established the Navy Occupational Task Analysis Program (NOTAP)
in 1976. Initially, the NOTAP used the Comprehensive Occupational Development and
Analysis Program (CODAP) as the analytical software to produce occupational standards
and to analyze the results from fleet surveys. Rising costs of production, Navy personnel
downsizing, increased sophistication of technology, and requirements to more rapid
production forced the Navy to identify alternative methods for maintaining its
occupational structure. In 1988, the NOTAP replaced CODAP with Raosoft, Inc™.
because it enabled the Navy to collect data using diskettes (heretofore, we used printed
booklets.) In 2001, the NOTAP replaced Raosoft™ with the SkillsNET™ on-line
methodology that permits our occupational classification structure needs to be met from a
web-enabled environment.
NOTAP was challenged to rework the classification structure to provide increased
characterizations of the work and have “the ability to conduct what-if scenario’s for a
changing littoral operation.” Adoption of the SkillsNET™ methodology provides greater
flexibility in occupational structures, while moving the Navy into a multifaceted
environment where the analytical possibilities are virtually unlimited. This paper focuses
on several analytical vistas available to Navy decision makers and leadership.

Similarity of Jobs

The SkillObject™, a cluster of similar tasks, becomes the focus around which similarity
of Jobs – tasks, knowledge, skills, and abilities – is determined. Similarity knowledge
allows the Navy to realize cost savings, common training delivery, and it eliminates
redundancy. Similarity is calculated using scaled skills, scaled abilities, percentage of
overlap for tools and knowledge categories.

Job Transfer

The Navy has a major problem with Sea-Shore rotation of personnel. It is difficult to
place individuals in a shore billet where his/her sea skill requirements are maintained or
strengthened. Reality tells us basic skills have historically eroded.

Transferability measures enable us to dissect the job at sea and place individuals in shore
billets where a portion of their skills are maintained. This enables us to eliminate
expensive retooling.
Transferability is a function of criticality, consequence of error between the targeted
shore job and the sea requirement, and Importance.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
506

General Difficulty of External Recruiting for a Job

The objective is to determine the level of difficulty associated with recruiting


individuals. Recruiting is a difficult and expensive proposition, particularly, when the
need for highly skilled Sailors is rising. Thus, we have the capability to look at demand
and pay ratios to meet this requirement.
Analytically, we are looking at the skillObjects™ within a job, the skills associated
with the tasks within the skillObject™. From these we calculate the Average Proficiency
for the job.
We then calculate the average industry pay using Department of Labor standards and
compare it with Navy pay and benefits.
This metric becomes extremely valuable when recruiting for military-specific
technical jobs.

Number of Sailors Required for the Mission

The objective is to determine the number of people needed to successfully perform the
mission. Requirements determination for the fleet is a major concern with our changing
world and changing missions. Historically, we have relied on industrial engineering
techniques that are time and labor intensive. Our goal is to develop the analytics that will
enable us to meet the requirements determination challenge with equal accuracy using
data collected from our occupational structures.
This analytic, using mission identification software, staffing requirements, criticality,
risk factors, has potential. It is not operation, but shows promise.

OTHER ANALYTICS

We are working on several other initiatives that provide the predictability required in a
changing military environment. Some of these include: Jobs that require teamwork,
Average level of expertise required for a job, Outsourcing, Depth of training, etc.

With the characterizations available to the Navy with the insertion of SkillsNET™
methodology into NOTAP, we have opened numerous vistas for occupational analytics

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
507

Anticipating the Future for First-Tour Soldiers

Tonia S. Heffner and Trueman Tremble


U.S. Army Research Institute for the Behavioral and Social Sciences
Alexandria, VA USA
Roy Campbell and Christopher Sager
Human Resources Research Organization
Alexandria, VA USA

As the U.S. Army moves through transformation, the focus is on the impact of current
and future technology changing the nature of war and training preparation. This transformation
will involve development and fielding of Future Combat Systems (FCSs) to achieve full
spectrum dominance through a force that is responsive, deployable, agile, versatile, lethal, and
fully survivable and sustainable under all anticipated future combat conditions. Army leadership
recognizes the critical importance of Soldiers to the effectiveness of transformation. Although
the Army’s Objective Force 2015 White Paper (Department of the Army, 2002) demonstrates
recent thinking regarding the way transformation will impact the individual Soldier and the
personnel system, this work is in its infancy. It is assumed that enlisted soldiers will require
considerably greater personal, social, and technological sophistication, but this assumption has
received limited empirical investigation.

Anticipating the need for solid research on Soldiers in the future, the U. S. Army
Research Institute for Behavioral and Social Sciences (ARI) initiated research to examine the
future first-term Soldier, New Predictors for Selecting and Assigning Future Force Soldiers
(Select21; Sager, Russell, Campbell, & Ford, 2003). Select21 research focuses on the assumption
that future entry-level Soldiers will require different capabilities than today’s soldiers. The
research seeks to understand what those capabilities might be and to determine if the Army’s
procedures for selecting and assigning new Soldiers to future jobs would benefit from personnel
tests that measure the capabilities not currently assessed as part of the Army’s current
accessioning process. The Army’s selection and classification process now relies on
measurement of cognitive capabilities through the Armed Services Vocational Aptitude Battery
(ASVAB). Thus, Select21 basically tests the hypothesis that performance prediction for future
entry-level jobs is increased, over ASVAB scores, by inclusion of knowledges, skills, and the
other personnel attributes (KSAs) measures important to the performance demands of future
jobs.

Fig. 1 shows the overall design of the Select21 project. Most of this research is being
conducted with the support of the Human Relations Research Organization (HumRRO). The
presentations by HumRRO researchers provide more detail on individual aspects of the project.
Here, we discuss research challenges and solutions that together shaped the Select21 design.
Research following from that design has produced a clearer vision of the conditions under which
future Soldiers will perform and that the Select21 research needs to recognize.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
508

CHALLENGES AND SOLUTIONS

Operational Utilization
Prospective enlistees of all U.S. military services take the Armed Service Vocational
Aptitude Battery (ASVAB) and typically are assessed at the same testing stations. The
implications of this multi-service system are (1) the large number of Soldiers who are assessed,
(2) the compatibility of the Army’s evaluation procedures with the overall system supporting all
services, and (3) the cost-effectiveness of introducing new measures. Select21 researchers are
attending to the obstacles of these implications. As the research shows progress, we anticipate
seeking greater cross-service involvement.

Army-Wide Job Analysis

Cluster/MOS-Specific Job Analysis

Develop Predictors Concurrent


Field Criterion-
Test Related
Develop Criteria Validation

Develop
Recommendations

Jan 2002 Jan 2003 Jan 2004 Jan 2005

Figure 1. Select21 research design.

Future Focus
Select21 is intended to examine personnel decisions, but not for the current personnel.
Instead, the focus is selecting and classifying personnel now who will meet the demands of a
future, the groundwork for which is now just being laid. Indeed, it is anticipated that the Army
will be engaged in a continual cycle of transformation. Moreover, features of the current Army
will remain as innovation takes hold and becomes fully characteristic. To support selection and
assignment for the transformation, we had to perform an analysis of future Soldier jobs. Research
on future jobs challenges the traditional job analysis approaches used to identify personnel
requirements. Traditionally, personnel requirements are determined by interviewing subject
matter experts (SMEs) and job incumbents to identify the work environment, the tasks
performed, and the associated KSAs necessary for successful job performance. The most critical,
yet most challenging, aspect of this job analysis activity is to determine how the future will differ
from current experience. For Select21 research, however, all of these aspects have had to be
projected – neither the environment, the equipment, nor those experienced in working future jobs

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
509

now exist. Therefore, we had to derive a unique methodology to depict the future working
environment.

Select21 has had the advantage of recent ARI research on future noncommissioned
officer requirements, Maximizing 21st Century NCO Performance (NCO21; Ford, Campbell,
Campbell, Knapp, & Walker, 2000; Knapp et al., 2002; Knapp, McCloy, & Heffner, 2003). The
overarching goal of NCO21 was to identify personnel who are best suited for entry into and
progression through the NCO corps, despite future changes. The NCO21 research included a
futuristic job analysis that projected future job performance demands and critical KSAs.
Fortunately, the NCO21 analysis specifically included first-term soldiers. Select21 has been able
to use and to update the NCO21 products based on the recent views of the future captured under
the notion of transformation.

Validating projections of future KSAs is also a formidable challenge because of the


absence of performance measures. NCO21 faced this challenge, with results showing the
difficulty of obtaining scores that differentiate current performance from projections of future
performance. The NCO21 research, thus, emphasized the importance to Select21 of a broad
performance model, multiple measures reflecting the model, and including a focus on jobs that
both exist today and are likely to be characteristic of the future.

Select21 also considers more than job performance. The research takes into account the
total system change of the transformation that includes organizational operations and the overall
organizational lives of Soldiers. In addition to changes in job performance demands, system
change could add to requirements for increased personal and social skills and motivation.
Select21 seeks to provide a database capability for research on KSAs likely to influence
individual fit into the future Army and decisions to remain in the Army, using a person-
environment fit model.

As subsequent papers will emphasize, expert judgment has been critical to formulations
of the future and to decisions about job performance dimensions, KSAs, and the measurement
process. Expert panels have ensured that the project tracks with transformation plans. Panels
have also provided direct reviews and judgments about task analysis products. We anticipate that
expert input will help guide recommendations for and actual product utilization.

Selection and Job Classification


The Select21 project capitalizes on the knowledge gleaned from the NCO21 research, but
it seeks to advance findings by investigating the possibility of providing measures useful for
assigning soldiers to jobs, as well as screening them for suitability for organizational entry. The
challenge here is to create KSA measures that are excellent predictors of the criteria and able to
make differential prediction for the well over 150 different jobs or military occupational
specialties (MOSs) to which Soldiers are assigned. To deal with this challenge, the research
sought to group MOSs viable for the future into clusters based on a principle of likely job
demand homogeneity.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
510

ANTICIPATED FUTURE CONDITIONS

At the initiation of the Select21 research effort (Sager et al., 2003), we reviewed the
anticipated conditions of the future that had been derived for NCO21. The NCO21 conditions
represented themes concerning: self-direction, information processing, computer familiarity,
increased technical skills, increased leadership skills, understanding systems, managing multiple
functions, stamina, and adaptability. It was apparent that the future conditions for NCOs would
need to be adapted because projections and realities had changed since the NCOs were
investigated. Further, the requirements for NCOs are inherently broader than those for first-tour
Soldiers. Once again, the current writings and projections were reviewed and integrated with
those things learned from the NCO investigation to generate possible future conditions. We then
interviewed SMEs who reviewed and revised the list of anticipated Army-wide conditions for
first-tour Soldiers (see Table 1).

Table 1. Anticipated Army-wide Conditions for First-tour Soldiers

Learning Environment Greater requirement for continuous learning and the


need to independently maintain/increase proficiency
on assigned tasks
Disciplined Initiative Less reliance on supervisors and/or peers to performed
assigned tasks within their realm of defined
responsibilities
Communication Method and Frequency Greater need to rely on digitized communications,
understanding common operational picture, and
increased situational awareness
Individual Pace and Intensity Greater need for mental and physiological stamina,
understanding of personal status, and adaptability
Self-Management Greater emphasis on ensuring Soldiers balance and
manage their personal matters and well-being
Survivability Improved protective systems, transportation,
communications, and medical care will result in
improvement in personal safety

Expectation of change was the impetus for Select21, and the themes in Table 1
summarize the types of changes creating future job demands on first-tour Soldiers and the KSAs
important to successful performance of the demands. Based on inspection, they include
requirements to cope successfully with change (Learning Environment and Individual Pace and
Intensity). The themes also point to at least two system changes (Communication Method and
Frequency; Survivability). The theme of “survivability” highlights a condition of work already
existing in the Army but perhaps having new implications. Finally, certain themes (Learning
Environment, Disciplined Initiative, Individual Pace and Intensity, and Self-Management)
spotlight personal attributes that include “learning orientation,” “independence,” “disciplined
self-reliance,” and a combination of physical and psychological states enabling behavioral
“adaptation.” Importantly, the condition of Self-Management recognizes needs for balancing
work with personal matters, to include well-being. A potential conflict in these future conditions

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
511

perhaps deserves note. This is the juxtaposition of the rugged and relatively energetic attributes
having to do with a learning orientation, independence, and self-reliance with the attributes that
could promote an emphasis on personal safety and balance among competing life roles.

There is no totally “clean” way to compare the Select21 and NCO21 conditions.
Difficulty arises from the development process: Select21 conditions were derived in part from
the NCO21 list. The lists were also intended to serve different purposes, with the Select21 and
NCO21 lists intended to serve job description/analysis of first-tour Soldiers and NCOs,
respectively. Table 2 was nevertheless generated with these difficulties in mind and provides a
take on the overlaps in the two sets of conditions.

Table 2. Summary of Comparability of Select21 and NCO21 Conditions for First-tour Soldiers

Select21 Condition NCO21 Condition

Relatively Overlapping Conditions


Learning Environment Adaptability
Disciplined Initiative Self-Direction
Communication Method and Frequency Computer Familiarity/Information Processing
Individual Pace and Intensity Stamina

Relatively Non-Overlapping Conditions


Self-Management
Survivability
Increased Technical Skills
Increased Leadership Skills
Understanding Systems
Manage Multiple Functions

Even though the view in Table 2 is open to disagreement, it provides suggestions.


Without doubt, the non-overlapping conditions from the NCO21 list reflect the purpose of the
list as setting conditions for NCO performance. Thus, Understanding Systems and Manage
Multiple Functions (NCO21) are exclusive to NCOs. These may reflect a change in our
understanding, but likely reflect the different ranks. Although technologies exist to provide the
information for all Soldiers and NCOs gain these skills (e.g., Force XXI Battle Command,
Brigade-and-Below [FBCB2]), these are considered functions of leading Soldiers.

Also interesting are the other trends suggested by the comparison. That is, the
comparison suggests that views of the future associated with Army transformation are somewhat
more consolidated than the views during the era of NCO21. This consolidation may be an
outcome of the systems planning that the Army has undertaken for the transformation. Thus,
some conditions have broader but more specified and more integrated implications. A good
example is the comparison of Individual Pace and Intensity (Select21) with Stamina (NCO21).
Both refer to increased needs for physical and mental superiority. A prime difference is the
degree to which the Select21 condition specified the aspects of fitness. Further, the Select21
condition integrated the notions of understanding personal status and adaptability. The NCO21

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
512

list had portrayed Stamina and Adaptability as distinct dimensions. Another example is the
Select21 condition of Self-Management. (Select21). Nothing comparable to this concept appears
in the NCO21 list. Self-Management probably implies taking control of job activities, but it more
explicitly refers to taking care of personal matters and balancing work and personal issues. Thus,
Self-Management may reflect a stronger integration of Soldier well being into its picture of the
future. As mentioned earlier, this more recent picture also includes an emphasis on personal
safety.

THE WAY FORWARD

Select21 research has used the anticipated future conditions as the context for
accomplishing its objectives. These include: (a) job clusters of similar first-tour soldier MOSs (b)
selection of clusters and representative MOSs for viability and potential for differential
prediction, (c) projection of future conditions for the representative MOSs, (d) job analysis of
the representative MOSs, and (e) construction of predictor measures and criterion measures.

To the extent that the Select21 future conditions represent consolidation of the
transformation planning process, they raise optimism about the likely usefulness of Select21
products. The principal products will be predictor measures and recommendations to the Army
for the selection and classification of future first-term Soldiers. The research now concentrates
on development of the predictor and criterion measures for subsequent use in test of concurrent
validity.

REFERENCES

Department of the Army (2002). Army Training and Leader Development Panel (NCO). FT
Leavenworth, KS, Author.

Department of the Army (2002). Objective Force 2015. Arlington, VA: Author.

Ford, L. A., Campbell, R. c., Campbell, J. P., Knapp, D. J., & Walker, C. B. (2000). 21st Century
Soldiers and Noncommissioned Officers: Critical predictors of performance (Technical
Report 1102). Alexandria, VA: U.S. Army Research Institute for the Behavioral and
Social Sciences.

Halal, W. E., Kull, M. D., & Leffmann, A. (1997, November-December). Emerging


technologies: What’s ahead for 2001-2030. The Futurist, 20-28.

Knapp, D. J., Burnfield, J. L., Sager, C. E., Waugh, G. W., Campbell, J. P., Reeve, C. L.,
Campbell, R. C., White, L. A., & Heffner, T. S. (2002). Development of Predictor and
Criterion Measures for the NCO21 Research Program (Technical Report 1128).
Alexandria, VA: U. S. Army Research Institute for the Behavioral and Social Sciences.

Knapp, D. J., McCloy, R., & Heffner. T. S. (2003). Validation of Measures Designed to
Maximize 21st Century Army NCO Performance (Contractor Report). Alexandria, VA:
Human Resources Research Organization.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
513

Sager, C. E., Russell, T. L., Campbell, R. C., & Ford, L. A. (2003). Future Soldiers: Analysis of
Entry-Level Performance Requirements and their Predictors. Alexandria, VA: Human
Resources Research Organization.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
514

FUTURE-ORIENTED JOB ANALYSIS FOR FIRST-TOUR


SOLDIERS41
Christopher E. Sager, Ph.D. and Teresa L. Russell, Ph.D.
Human Resources Research Organization
Alexandria, VA, USA
csager@humrro.org

INTRODUCTION

The Select21 project was undertaken to help the U.S. Army ensure that it acquires
Soldiers with the knowledges, skills, and attributes (KSAs) needed for performing the types of
tasks envisioned in a transformed Army. Army leadership recognizes the importance of its Soldiers
to the effectiveness of transformation. In this context, the ultimate objectives of the project are to
(a) develop and validate measures of critical KSAs needed for successful execution of the Future
Army’s missions and (b) propose use of these measures as a foundation for an entry-level selection
and classification system adapted to the demands of the 21st century. The purpose of this first stage
of the project was to conduct a future-oriented job analysis to support the development and
validation effort.

APPROACH

In this section we briefly describe concepts underlying our approach, challenges, and the
strategies we used to complete the future-oriented job analysis.

Underlying Concepts

The Select21 project focuses on the period of transformation to the Future Army—a
transition that is envisioned to take on the order of 30 years to complete (Institute for Land
Warfare, October 17, 2000). This conceptualization of the transformation implies that the next
several years will include elements of the Army (a) in its current state, (b) transitional systems,
and (c) combat systems characteristic of the fully transformed Future Army. Our goal is to
develop measures of KSAs that will be useful in the not too distant future and remain so for
many years. Therefore, we decided to focus on the time period during which all of these
elements will be present simultaneously. This transformation will affect first-tour Soldier
requirements in at least two ways: (a) the types of missions for which Soldiers need to prepare
will grow in number and complexity, and (b) the tools and equipment Soldiers will be using to
perform these missions are undergoing significant changes (U.S. Army, 2001; December, 2002).

41
This paper is part of a symposium titled Selecting Soldiers for the Future Force: The Army’s Select21 Project
presented at the 2003 International Military Testing Association Conference in Pensacola, FL (D. J. Knapp, Chair).
The views, opinions, and/or findings contained in this paper are those of the authors and should not be construed as
an official U.S. Department of the Army position, policy, or decision.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
515

The primary goal of selection/classification, and many other human resources


interventions, is to positively affect the job performance of individuals/Soldiers. Consistent with
this goal, this project began by developing a description of job performance. Here job
performance is defined as “…actions or behaviors that are relevant to the organization’s goals
and that can be scaled (measured) in terms of each individual’s proficiency (that is, level of
contribution)” (Campbell, McCloy, Oppler, & Sager, 1993, p. 40). Based on these descriptions of
job performance, we make inferences about the KSAs Soldiers need to perform the behaviors
that make-up their performance requirements.

A model of job performance that links future-oriented performance requirements and KSAs
aids these inferences. It hypothesizes that job performance is a function of the individual’s
declarative knowledge (DK), procedural knowledge and skill (PKS), and motivation (M; Campbell
et al., 1993). Different aspects of job performance can have different DKs and PKSs associated
with them. The individual’s abilities, personality/temperament, interests, education, training, and
experience are antecedents to DK, PKS and M. In terms of the Select21 project, this means that
performance on a given performance requirement is a function of DK, PKS, and M, where the
KSAs are the antecedents.

Consistent with this model, our job analysis approach was driven by future-oriented
performance requirements. We defined the performance requirements, and then identified a master
list of future KSAs—including salient individual differences attributes identified in prior research. In
turn, we linked the two—identifying the KSAs likely to predict various performance requirements.

Challenges and Strategies

The goal of a future-oriented job analysis is to take the broad, dynamic plans for future
directions, identify trends in Army jobs over time, and describe future jobs at a level that is specific
enough to guide predictor and criterion development. This goal created several challenges.

Army-Wide Performance Requirements

Select21’s measure development needs and the level of detail at which the Future Army is
currently being described led us to describe Army-wide future performance requirements via three
products. We refer to the first as the Select21 Army-Wide Performance Dimensions for First-Tour
Soldiers.42 These dimensions are 19 general components of performance that are expected to be
critical to the future. They are conceptually consistent with the job performance dimensions
developed for related past Army projects that serve as building blocks for this effort (e.g., Project
A [Campbell & Knapp, 2001] and NCO21 [Ford, R. C. Campbell, J. P. Campbell, Knapp, &
Walker, 2000]). These dimensions are future-oriented and supported our needs for developing
important criterion measures (e.g., job performance ratings and a situational judgment test [SJT]).
To support development of criteria that need more specific job analysis information (e.g.,
multiple-choice tests), we also developed the Select21 Army-Wide Common Tasks for First-Tour

42
Here we define “first-tour” Soldiers as those who have 18 to 36 months time in service.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
516

Soldiers. They are 59 individual technical tasks. They are conceptually consistent with the Army’s
current list of common tasks that, according to Army doctrine, all first-tour Soldiers should be able
to perform (U. S. Department of the Army, April, 2003).

The performance dimensions (a) provide a description of the critical dimensions of


performance in the Future Army, (b) are helpful for developing some criteria, and (c) assist in
identifying relevant KSAs. The Select21 common tasks have provided enough technical details to
facilitate the development of future-oriented multiple-choice questions of performance and to think
about hands-on tests and simulations. The remaining challenge was that the future does not look
considerably different from the present at the level of the performance dimensions; the same is true
for tasks, given the level of stability and detail forecasts of the future can currently achieve. To
support the development of expected future performance measures that are as future-oriented as
possible, we developed a third Army-wide job analysis product that we refer to as the Anticipated
Army-Wide Conditions in the 21st Century for First-Tour Soldiers. These anticipated conditions
focus on how the future Army will place new and different requirements on first-tour Soldiers.

Cluster/MOS-Specific Performance Requirements

The primary reason for collecting Military Occupational Specialty-specific (MOS-


specific) job analysis information is to show how performance requirements differ across MOS
and to guide the identification of pre-enlistment KSAs that differ in relevance across MOS. Such
a discovery would in turn facilitate the development of predictor measures that could improve
the classification efficiency of the current system. However, we were concerned that the
transformation to the Future Army would result in changes to the content of current MOS and to
the MOS structure as a whole. Based on this premise, we decided on a somewhat more general
unit of analysis (i.e., job clusters) that we believed would be more stable into the future than
individual MOS. After identifying 16 future job clusters, we selected two clusters to focus on for
the cluster-specific portion of our job analysis and measure development efforts. Following this
logic, we attempted to collect job analysis information at the cluster level. Because that portion
of a particular MOS that is not Army-wide primarily focuses on technical tasks, we aimed our
initial efforts at cluster-specific tasks. Finally, for data collection and sampling purposes, we
identified three current MOS to represent each cluster.

At first, we tried to develop task lists that (a) applied to all three target MOS in each target
cluster, (b) were sufficiently detailed to support the development of measures of current job
performance (e.g., multiple-choice tests, hands-on tests, and ratings), and (c) were future-oriented
enough to support the development of measures of expected future performance. This approach did
not work. We found that cluster-level task descriptions were simply too confusing for SMEs who
are entrenched in a specific MOS. Additionally, cluster-level tasks were, by necessity, broader than
MOS-specific ones—making cluster-level tasks less useful for development of criterion measures.
While we retained the clusters for sampling jobs for inclusion in this effort and for summarizing
results across MOS, we created MOS-specific task lists to support cluster/MOS-specific measure
development. Similar to the Army-wide analysis, we also developed cluster/MOS-specific
anticipated future conditions.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
517

Knowledges, Skills, and Attributes

We started by developing a list of Army-wide KSAs with two important characteristics.


The first is a pre-enlistment focus. Since the goal of this project is to develop measures that can
be used for selection and classification, we determined that the KSA list should focus on
characteristics that Soldiers are likely to have before enlistment (i.e., before they are trained on
“common tasks” or on tasks specific to their MOS). The second characteristic is
comprehensiveness. The list includes 48 KSAs offering complete coverage of the measurable
and potentially relevant individual difference constructs across a number of domains (i.e.,
cognitive, personality/temperament, physical, psychomotor, and sensory). Therefore, this single
list was used in the Army-wide and cluster/MOS-specific job analysis.

Method

This analysis included a literature review in three areas—future Army literature, research
literature on jobs (particularly Army MOS), and literature on human attributes. The future Army
literature provided information that allowed us to make initial inferences about (a) Future Army
missions; (b) the functions and roles that Soldiers will play in those missions and the KSAs those
Soldiers will need; (c) new technology such as weaponry, vehicles, communication devices, and
the effect of technological change on personnel requirements; and (d) likely changes in the force
structure in the future (e.g., Unit of Action Maneuver Battle Lab, 2002: U.S. Army 2001;
December, 2002; U.S. Army Training and Doctrine Command, July, 2002). Research on jobs
provided information about task taxonomies, KSAs, and tasks generated in other Army projects.
Literature on human attributes told us what KSAs have been identified and measured reliably in
the major domains of human performance—including cognitive, personality, psychomotor,
physical, skill, and interest domains (e.g., Fleishman, Costanza, & Marshall-Mies, 1999; Ford et
al., 2000; Campbell & Knapp, 2001).

In addition to reviewing the relevant literature, we relied heavily on meetings, briefings,


and workshops with subject matter experts (SMEs). Their contributions included (a) providing
feedback on the quality and practicality of research plans, (b) revising performance requirements
and KSAs, (c) evaluating the importance of performance requirements and KSAs, and (d)
developing anticipated future conditions. A number of the SMEs participating in this project were
organized into three panels—a Scientific Review Panel (SRP), an Army Steering Committee
(ASC), and an Army Subject Matter Expert Panel (SMEP). The SRP is composed of scientists
knowledgeable in the areas addressed by this research. The ASC is a policy advisory group that
includes senior representatives from a number of Army organizations concerned with
transformation. The SMEP is composed of personnel who are expert in particular MOS or
specific Future Army planning activities. Finally, other Soldiers and Non-commissioned Officers
(NCOs) participated in additional workshops and data collections.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
518

RESULTS

As discussed above, the Army-wide future performance requirements were described in


three ways: (a) performance dimensions, (b) common tasks, and (c) anticipated future conditions.
Fig. 1 shows the 19 Army-wide performance dimensions. The common tasks are not shown here,
due to space consideration. A detailed description of anticipated Army-wide conditions is
provided by Heffner, Tremble, Campbell, & Sager (2003).

Performs Common Tasks Follows Instructions and Rules


Solves Problems and Makes Decisions Exhibits Integrity and Discipline on the Job
Exhibits Safety Consciousness Demonstrates Physical Fitness
Adapts to Changing Situations Demonstrates Military Presence
Communicates in Writing Relates to and Supports Peers
Communicates Orally Exhibits Selfless Service Orientation
Uses Computers Exhibits Self-Management
Manages Information Exhibits Self-Directed Learning
Exhibits Cultural Tolerance Demonstrates Teamwork
Exhibits Effort and Initiative on the Job

Figure 1. Select21 Army-Wide Performance Dimensions for First-Tour Soldiers.

To identify target clusters for the cluster/MOS-specific portion of this project, we first
needed a useful way of organizing all entry-level Army jobs into a smaller group of clusters. The
final list of 16 clusters included the full domain of likely future entry-level Army jobs. The target
clusters identified for focused study and their representative MOS were:
Close Combat
− 11B Infantryman
− 19D Cavalry Scout
− 19K M1 Armor Crewman
• Surveillance, Intelligence, and Communications (SINC)
− 31U Signal Support Systems Specialist
− 74B Information Systems Operator/Analyst
− 96B Intelligence Analyst.

MOS-specific tasks organized in task categories were developed for each of these six
MOS. For example, the Infantryman list includes tasks in 11 task categories. One of these
categories is Performs Tactical Operations. This category contains tasks like (a) Move as a
member of a fire Team, (b) Select hasty firing positions during an urban operation, and (c)
Sustain and camouflage fighting positions. In addition to these tasks, we developed descriptions

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
519

of anticipated future conditions applying to each target MOS and cluster. Fig. 2 shows excerpts
from the anticipated future conditions applicable to Infantryman.

• For the foreseeable future, infantry will continue to operate as mechanized infantry and light
infantry. They will be delivered to the battle area by air, helicopter, ground vehicles, and walking.
• All infantry will see improvements in communication and location capability when in dismounted
mode. This will include a GPS integrated navigation system.
• Individual weapon (e.g., rifle) improvements will include thermal sights, daylight sights, close
combat optics, lasers, and weapon systems connected to a digital reporting recording network.
• Infantrymen will experience better individual protection through (a) integrated combat
identification systems, (b) full time chemical/biological clothing, (c) intercepting body armor and
(d) laser eye protection.
• Long term, possibilities include target detection and engagement without exposure (i.e., individual
non-line-of-sight fire).
• Full C4I capability and situational awareness (SA) interconnectivity is dependent on the future
development of lightweight, multiday power sources (i.e., batteries) that are rechargeable and
logistically supportable.
• Changes in infantry technology will occur incrementally. Overall there will be no major mid-term
changes to infantry organizations, formations, employment, or tactics.
Figure 2. Selected Anticipated Conditions in the 21st Century Relevant to First-Tour Infantryman.

The Select21 list of pre-enlistment KSAs is presented in Fig. 3. Direct ratings of


importance and linkages of KSAs to Army-wide performance dimensions and cluster/MOS-
specific task categories provided information about the relative importance of KSAs to job
performance. Important Army-wide KSAs included General Cognitive Aptitude, Dependability,
Oral and Nonverbal Comprehension, and Emotional Stability. When comparing the target MOS
across the two clusters, Close Combat favored psychomotor and physical attributes and Team
Orientation while SINC favored Basic Computer Skill and Reading Skill and Comprehension.

SUMMARY

This analysis generated performance requirements describing the nature of work for entry-
level Soldiers’ during the transformation. These requirements guided the identification and
prioritization of KSAs that are being used to develop new predictor measures that could be useful
for recruit selection and MOS assignment. These requirements are also being used to develop job
performance measures that will serve as criteria for evaluating the predictors in an eventual
concurrent criterion-related validation effort.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
520

Cognitive Attributes:
Oral Communication Skill Spatial Relations Aptitude
Oral and Nonverbal Comprehension Vigilance
Written Communication Skill Working Memory
Reading Skill/Comprehension Pattern Recognition
Basic Math Facility Selective Attention
General Cognitive Aptitude Perceptual Speed and Accuracy
Temperament Attributes:
Team Orientation Affiliation
Agreeableness Potency
Cultural Tolerance Dependability
Social Perceptiveness Locus of Control
Achievement Motivation Intellectance
Self-Reliance Emotional Stability
Physical Attributes:
Static Strength Extent Flexibility
Explosive Strength Dynamic Flexibility
Dynamic Strength Gross Body Coordination
Trunk Strength Gross Body Equilibrium
Stamina
Sensory Attributes:
Visual Ability Auditory Ability
Psychomotor Attributes:
Multilimb Coordination Arm-Hand Steadiness
Rate Control Wrist, Finger Speed
Control Precision Hand-Eye Coordination
Manual Dexterity
Procedural Knowledge and Skill:
Basic Computer Skill Self-Management Skill
Basic Electronics Knowledge Self-Directed Learning and Development Skill
Basic Mechanical Knowledge Sound Judgment
Figure 3. Select21 Knowledge, Skills, and Attributes.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
521

REFERENCES

Campbell, J. P., & Knapp, D. J. (2001). Exploring the limits in personnel selection and
classification. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, C. E. (1993). A theory of performance. In
N. Schmitt, & W.C. Borman (Eds.), Personnel selection in organizations (pp.35-70). San
Francisco: Jossey-Bass.

Fleishman, E. A., Costanza, D. P, & Marshall-Mies, J. (1999). Abilities. In N.G. Peterson, M. D.


Mumford, W. C. Borman, P. R. Jeanneret, & E. A. Fleishman (Eds.), An occupational
information system for the 21st century: The development of O*NET (p.175-195).
Washington DC: American Psychological Association.

Ford, L. A., Campbell, R. C., Campbell, J. P., Knapp, D. J, & Walker, C. B. (2000). 21st Century
soldiers and noncommissioned officers: Critical predictors of performance (Technical
Report 1102). Alexandria, VA: U.S. Army Research Institute for the Behavioral and
Social Sciences.

Heffner, T. S., Tremble, T., Campbell, R. C., & Sager, C. E. (2003, November). Anticipating the
future for first-tour soldiers. In D. J. Knapp (Chair) Selecting Soldiers for the Future
Force: The Army’s Select21 Project. Symposium presented at the 45th Annual
Conference of the International Military Testing Association, Pensacola, FL.

Institute for Land Warfare. (2000, October 17). Soldiers on point for the nation, Army
transformation. Briefing presented to the Army Transformation Panel at the AUSA
Annual Meeting, Washington, DC.

Unit of Action Maneuver Battle Lab. (2002). Operational requirements document for the future
combat systems. Ft. Knox, KY: Author.

U.S. Army. (2001). Concepts for the Objective Force, United States Army white paper. Online
at: http://www.army.mil/features/WhitePaper/default.htm.

U.S. Army. (2002, December). Object Force in 2015 white paper. Arlington, VA: Department of
the Army: Objective Force Task Force.

U.S. Army Training and Doctrine Command. (2002, July). The United States Army Objective
Force: Operational and organizational plan for maneuver unit of action (Pamphlet 525-
3-90/O&O). Fort Monroe, VA: Author.

U. S. Department of the Army. (2003, April). STP 21-1-SMCT, soldier’s manual of common
tasks skill level 1. Washington, DC: Headquarters, Author.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
522

PERFORMANCE CRITERIA FOR THE SELECT21 PROJECT43


Patricia A. Keenan, Ph.D., David A. Katkowski, Maggie M. Collins,
Karen O. Moriarty, and Lori B. Schantz
Human Resources Research Organization
Alexandria, VA, USA
pkeenan@hummro.org

INTRODUCTION

The U.S. Army is undertaking fundamental changes to transform into the Future Force—
a transition envisioned to take approximately 30 years to complete. The time frame of interest
extends to approximately 2025. The overall goal of the Select21 project is to ensure the Army
selects and classifies soldiers with the knowledge, skills, and attributes (KSAs) needed for
performing the types of tasks envisioned in a transformed Army.

One of the central tasks of the Select21 research program is to develop performance
measures to support criterion-related validation efforts. The development effort comprises several
criterion measures, which reflect both “can-do” and “will-do” constructs. Can-do measures include
performance-oriented job knowledge tests; will-do measures include observed current performance
ratings, expected future performance ratings, and archival/self-report information. Our approach to
solving the future orientation problem is to develop criterion measures that reflect existing job
performance in tasks that will remain virtually the same in the future, as well as measures that
simulate future conditions under which the job would be performed.

This paper describes development of (a) supervisor and peer ratings for current and
expected future performance, (b) a job knowledge test that has both Army-wide and Military
Occupational Specialty-specific (MOS-specific) components, and (c) a self-report measure that
includes archival information (e.g., evaluations, awards, education). Other criterion measures
developed for Select21 include a criterion situational judgment test (Waugh & Russell, 2003)
and a measure of person-environment fit (Van Iddekinge, Putka, & Sager, 2003).

Instrument Development

The rating scales and job knowledge tests have been developed following the same
general procedures. HumRRO project staff used available information to draft materials to be
reviewed and refined by subject matter experts (SMEs). These SMEs were most often instructors
in Advanced Individual Training (AIT) or One Station Unit Training (OSUT) who teach
technical material to new Soldiers to prepare them for their first duty position. We asked the

43
In D. J. Knapp (Chair), Selecting Soldiers for the Future Force: The Army’s Select21 Project. Symposium
conducted at the 2003 International Military Testing Association (IMTA) Conference, Pensacola, FL. The views,
opinions, and/or findings contained in this paper are those of the authors and should not be construed as an official
U.S. Department of the Army position, policy, or decision.

45th Annual Conference of the International Military Testing Association 522


Pensacola, Florida, 3-6 November 2003
523

SMEs to think of Soldiers they supervised and use the rating scales to assess their performance
and then to give us feedback on how they used the scales and any problems they had with the
scales. NCOs also helped to write test items and review items written by the HumRRO project
staff. Students in AIT/OSUT also completed the job knowledge items and rating scales. We had
planned to conduct a pilot test for both measures; however, most of the intended participants
became unavailable due to the deployments associated with Operation Iraqi Freedom. The field
test scheduled for late spring or summer of 2004 will provide the first large-scale chance to
administer the measures to intended users.

PERFORMANCE RATING SCALES

We are developing two types of rating scales—Observed Performance Scales (OPS) and
Future Expected (FX) scales. The OPS are ratings from target Soldiers’ supervisors and peers on
the Soldier’s current performance. We are developing versions of these scales for the Army-wide
and six target MOS samples. The FX scales will ask raters to assess the Soldier’s expected
effectiveness under conditions we anticipate will exist in the future (Sager & Russell, 2003). The
anticipated future conditions developed during the job analysis include both cluster- and MOS-
level information. We will incorporate both levels of information in the scales for Army-wide
and cluster samples.

Observed Performance Rating Scales


The Observed Performance Scales (OPS; both Army-wide and MOS-specific) were
developed with input from SMEs (primarily training instructors) who took part in a series of
workshops in the first half of 2003. In the validation data collections, many raters will be asked
to rate more than one Soldier using as many as four different scales, which could be quite a
burden for them. In an effort to reduce the load on raters, one of the first steps we took was to
review the performance requirements to see if they could be combined to reduce the number of
scales in each rating instrument. We combined the 19 Army-wide performance dimensions into
11 scales. For example, we combined the performance dimensions “Follows Instructions and
Rules,” “Exhibits Integrity and Discipline on the Job,” and “Exhibits a Selfless Service
Orientation” into a rating dimension named “Demonstrates Professionalism and Personal
Discipline on the Job.” We also included an “Overall Effectiveness” scale. The organization of
the Army-wide and MOS-specific scales is shown in Fig 1. We followed the same procedure for
the MOS-specific scales. For example, the 11B performance requirements included several tasks
related to maintaining personal and antitank weapons and grenades/rocket launchers. Subject
matter experts approved combining these into one scale. Our goal was to reduce the number of
scales while still differentiating between performance areas.

OPS Format

Much of our development work with the rating scales has focused on improving rater accuracy,
including the extent to which the resulting ratings differentiate an individual Soldier’s strengths
and weaknesses and differentiate between Soldiers. We have a good deal of experience in
training raters to avoid typical rating problems, including evaluation errors (e.g., stereotyping),

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
524

response tendency errors (e.g., halo, leniency), and comparing Soldiers to one another. Our goal
is to develop scales and rater training that will encourage raters to use the scales as a standard
against which to measure performance. This is a continuing challenge.

Common Task Performance Adaptation to Changes in Missions/Locations,


Performs Army-wide Common Tasks Assignments, Situations
Exhibits Safety Consciousness Adapts to Changing Situations
MOS-Specific Task Performance Exhibits Level of Effort and Initiative on the Job
Performs MOS-Specific Technical Tasks Exhibits Effort and Initiative on the Job
Uses Computer
Demonstrates Professionalism and Personal
Communication Performance Discipline on the Job
Communicates in Writing Adheres to Regulations, Policies, and
Communicates Orally Procedures
Exhibits Integrity and Discipline on the Job
Information Management Performance
Exhibits a Selfless Service Orientation
Manages Information
Personal and Professional Development
Problem Solving and Decision Making
Exhibits Self-Management
Performance
Exhibits Self-Directed Learning
Solves Problems and Makes Decisions
Demonstrates Physical Fitness
Exhibits Tolerance
Demonstrates Physical Fitness
Exhibits Cultural Tolerance
Demonstrates Military Presence
Supports Peers
Relates to and Supports Peers
Demonstrates Teamwork

Figure 1. Organization of performance dimensions for Army-wide observed performance scales.

In their final versions, each of the OPS has four sections: (a) title of the dimension, (b)
three summary performance paragraphs, (c) behavioral examples, and (d) a 7-point rating scale.
The summary paragraphs (anchors) provide a snapshot description of Soldiers’ behavior
representing three levels of performance: Exceeds Expectations, Meets Expectations, and Fails to
Meet Expectations. The behavioral examples are designed to provide additional pieces of
information about Soldiers' behavior at the various levels of effective performance to improve rater
accuracy. Fig. 2 provides an example of a rating scale. The 7-point rating scale allows raters to
differentiate between rating levels to provide ratings that are an accurate reflection of performance.

The scales looked rather different at the beginning of the project. Keeping in mind our
goal to have raters rely on the scales as the measurement standard, we revised the format of the
scales based on the results of practice rating exercises done with SMEs. The original scales
contained a title, a lead-in question about how effective the Soldier is at performing in that
dimension, and designations of High, Moderate, or Low performance for the three columns. We
discovered from discussion with the SMEs that some of them just read the question and decided

45th Annual Conference of the International Military Testing Association 524


Pensacola, Florida, 3-6 November 2003
525

on a rating, while others use the High-Low designations along with the question in deciding on a
rating. So, to try to force raters to read the scale and use that as the measure of performance, we
eliminated the question and the High-Low designations.

LEVEL OF EFFORT AND INITIATIVE ON THE JOB

Shows little effort or initiative in Demonstrates sufficient effort in Consistently demonstrates initiative
accomplishing even simple tasks accomplishing most tasks; puts and often puts forth extra effort to
forth extra effort when necessary accomplish tasks effectively, even
under difficult conditions

− Frequently fails to meet − Is usually reliable about − Almost always completes


deadlines completing assignments on time assignments on time

− Refuses or ignores − Accepts additional − Seeks out and enthusiastically


opportunities to take additional responsibilities; may takes on challenging
responsibilities occasionally seek out assignments and additional
challenging assignments responsibilities
1 2 3 4 5 6 7
Figure 2. Example of a Select21 observed performance rating scale.

We also tried another exercise to encourage raters to think about the relative strengths
and weaknesses of the Soldiers they rated. We asked raters in our early site visits to rank soldiers
on the Army-wide dimensions prior to rating them. We provided them with a set of cards on
which the rating scale dimensions and anchors were printed, and asked them to sort the cards in
the order that reflected the performance level of the soldiers they were rating. After sorting the
cards, we instructed the SMEs to record their rankings on a separate sheet and then to complete
the Army-wide OPRS. In the field test, we plan to simplify the task by asking raters to sort the
cards into three piles (Exceeds Expectations, Meets Expectations, and Fails to Meet
Expectations). They will then record the categorization, and make their ratings. We think this
will accomplish the same result as the ranking of performance on all 12 dimensions and reduce
the frustration encountered by “ties” in rankings. We will re-examine this issue after another
review to determine whether the added accuracy is useful and makes the task worth the projected
resource costs. If this exercise helps raters to differentiate their ratings, we will use it for the
Army-wide scales with the expectation that the lesson learned will carry over to the other scales.

Future Expected (FX) Performance Scales

The FX scales ask raters to predict how well the ratee might be expected to perform
under conditions we believe will exist for the Future Army. Both the Army-wide and cluster-
specific FX scales will be based on anticipated future conditions generated in the job analysis
phase of the project (Sager, Russell, Campbell & Ford, 2003). The Army-wide future conditions
to be rated are:

• Learning Environment
• Disciplined Initiative
• Communication Method & Frequency

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
526

• Individual Pace & Intensity

The Army-wide FX scales will incorporate descriptions of the anticipated future


conditions listed above. Raters will read the description and rate how effectively they think the
Soldier would perform under each condition. They will also provide a rating of how confident
they are in those ratings. The cluster-specific scales will be based on scenarios much like those
generated for the SJT-X in the NCO21 project (Knapp et al., 2002). These scenarios are more
detailed and are specific to the job cluster. Again, raters will indicate how well they think the
Soldiers would perform in that scenario. We anticipate developing 5-6 scenarios for the Close
Combat cluster, where the anticipated future conditions are much the same for the three MOS,
and 10-11 scenarios for the Surveillance, Intelligence, and Communication cluster, where there
is less overlap among the MOS. Some of these scenarios may be only applicable to one or two
MOS in the cluster. This process will not be as good as addressing independent themes or
constructs, but it will do well at sampling the relevant content for the MOS.

We will collect separate ratings for the Army-wide and cluster-specific FX scales and
conduct statistical analyses to determine whether there is dimensionality in the ratings. However,
because (a) each scenario is likely to involve multiple “dimensions” of performance and (b) a
single dimension of performance is likely to be relevant for more than one scenario, we believe
that dimensionality is very unlikely here. Therefore, for both Army-wide and cluster FX ratings,
we plan to aggregate the ratings into an overall Army-wide rating and an overall cluster rating,
respectively.

Rater Training/Process

Effective rater training is key to getting raters to use the rating scales as intended. We
have considerable experience in developing rater training that has focused on evaluation errors
(e.g., first impression, stereotyping) and response tendency errors (e.g., halo effect, central
tendency). This experience has shown that reducing or eliminating rating error is quite difficult.

Our goal with Select21 rater training is to more clearly focus raters on reading and using
the scales accurately. For all raters, training will emphasize the importance of making accurate
ratings and thinking about a Soldier’s relative strengths and weaknesses. To this end, we will
stress the importance of accurate performance measures to the overall success of the project. In
past work we have found that stressing the fact that the ratings are “for research purposes only”
helps to overcome problems, such as leniency, that are common in operational performance
ratings. We will focus on the importance of reading the anchors, thinking about a Soldier’s
relative strengths and weaknesses, and applying that insight to the ratings. The ranking exercise
described previously should help with this focus, as will the format of the rating scales. We will
address response tendency and evaluation errors. In addition, while raters are working, we will
have facilitators move about the room to keep an eye out for raters who seem to be falling prey
to these visible errors.

We expect that we will collect ratings from most of the raters in a face-to-face setting.
We also expect that a fairly large proportion of raters will not be available during the data
collection period. Identifying raters and collecting their ratings has been a challenge in past
efforts such as this, and we expect that we will encounter the same situation in this project. It is

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
527

fairly easy to identify first-level supervisors. We will have the names of each Soldier's direct
supervisor and they will be asked to attend the data collection, so getting ratings from them
should be fairly straightforward. However, we have found that it is beneficial to collect ratings
from multiple raters. For Select21, we would like to identify at least two supervisors and several
peers to rate a Soldier. The second supervisor may be an NCO or might be another Soldier in the
target Soldier's unit who has seniority over the target Soldier. When Soldiers come in for testing,
we will ask them to identify a second supervisor and several peers who could provide ratings for
them. If Soldiers come in a group, as we expect will happen with the large density MOS such as
infantry, we can identify peer raters from within that group. However, for the lower density
MOS, we expect that Soldiers will identify peers who can rate them. We will also ask
supervisors to provide the names of potential raters for their Soldiers. We will leave rating
materials for those raters to complete after the on-site data collection is over.

Collecting “Distance” Ratings

Collecting ratings from absentee raters is a very different problem than encountered in a face-
to-face session. The problem is two-fold—(a) persuading them to make ratings at all and (b) getting
them to make accurate ratings. First, whether or not to complete the ratings is highly discretionary for
them. Although we have been advised that getting buy-in from their supervisors will gain their
cooperation, this process has not been an overwhelming success in the past. Second, they will not
hear and see the training message, nor will they be able to ask questions. They will read as much or
as little of the instructions as they want, so it is likely they will not fully understand why their ratings
are important.

We will collect data from a small number of field NCOs in January. At that time, we will
talk with them about (a) the relative feasibility of using paper-and-pencil leave-behind packets or
asking Soldiers to access the Select21 website to make ratings for the 2004 field test and (b)
ways to increase the response rate for Soldiers who are unable to attend the regular rating
sessions. We have used paper-and-pencil leave-behind packets in other projects and know that
the response rate is not tremendously high. For Select21, we have the capability of either
allowing raters to access a website to make ratings or sending them rating forms via email. One
of the topics we will talk about in the January meetings is whether we can assume that Soldiers at
all installations will have access to the Internet and/or email in such a way that makes electronic
ratings feasible. We will incorporate their feedback into our plans for the field test. The field test
in 2004 will provide an important opportunity to try out multiple strategies for handling
“distance” ratings. We could evaluate the two options by seeing which most raters prefer and
comparing the quality of data obtained through each.

JOB KNOWLEDGE TEST

The purpose of the job knowledge criterion exam is to obtain an indicator of job
performance of first-tour Soldiers in the Future Force by measuring job-related knowledge at the
Army-wide and MOS-specific levels. The job knowledge criterion exam is a “can-do” measure
of first-tour Soldier performance.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
528

The original Select21 research plan called for development of both hands-on and
computer-based work sample simulations. It became clear early on, however, that the simulation
capabilities offered by the Perception® software being used for the job knowledge tests
mitigated the need for development of expensive, high fidelity computer-based simulations.
Moreover, there were no obvious Army-wide testing requirements that would be reasonably met
with computer-based simulations, and it would not be economically feasible to even consider
developing simulations for all six target MOS. The reality of developing criterion measures for
six MOS instead of two job clusters also led the project team to step back from the expectation
that hands-on tests would be developed for the Army-wide and the six MOS samples. The
Perception® testing software allows for a far more realistic presentation of materials (e.g., via
graphics such as illustrations, photographs, video clips) than the traditional paper-and-pencil
multiple-choice tests.

Perception® allows the use of a variety of question types (e.g., matching, ranking,
true/false, Likert-type scales) as well as standard multiple-choice. The use of graphics,
illustrations, photographs, and video clips also reduces the reading level required to take the
exam and, consequently, reduces the risk of adverse impact. Additionally, it allows the test items
(traditionally a measure of textbook knowledge) to be more performance-oriented.

Developing the Test

The content of these tests is driven by the performance requirements identified by the
future-oriented job analysis of first-tour Soldiers. The test blueprints (i.e., content specifications,
including the degree to which each content area is reflected in the test) are composed of the
performance requirements that are easily captured in a written test (e.g., knowledge of first aid
procedures is more easily tested than oral communication skill by this method). Although the test
blueprints are composed of tasks, it is the knowledge required to perform each task that is
captured by the test questions; thus, the instruments are referred to as knowledge tests. HumRRO
project staff developed draft blueprints that were reviewed and revised by AIT/OSUT
instructors, drill sergeants, and other SMEs.

Developing an Item Bank

The final Army-wide test will have 60-80 test questions and the final MOS-specific
instruments will have 40-60 test questions. Because many questions are dropped during the
review process, the goal is to write at least twice as many questions as required, per category.
The test questions were written primarily by HumRRO staff, as well as by training instructors
during data collection visits. Other questions were imported from the Project A item bank
(Campbell & Knapp, 2001). Multiple sources were used for item content, including the Soldiers’
Manual of Common Tasks, and field manuals and study guides available online (e.g.,
www.adtdl.army.mil, www.usapa.army.mil, and www.armystudyguide.com). These references
are also useful sources of pictures and graphics. For content where there are no existing pictures
or video clips, graphics are being collected at the schools.

All items go through an iterative review process. HumRRO project staff and Army
instructors developed new questions and identified relevant Project A questions. In-house test
experts and staff familiar with Army performance requirements reviewed all test questions. These

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
529

reviewers considered the currency of item content (e.g., in terms of technology and/or procedures)
and how well each item adheres to the task requirement. Once items have passed the in-house
review, they are presented to focus groups of school instructors and/or drill sergeants. Again, these
reviewers consider the currency and relevance of item content. They also make revisions as
necessary. HumRRO project staff implement revisions and update item banks, writing additional
questions as necessary to replace dropped questions.

SELECT21 PERSONNEL FILE FORM (S21-PFF)

The Select21 Personnel File Form (S21-PFF) will serve as a self-report criterion measure
for use in the Select21 validation effort. The S21-PFF will closely parallel the content of the
Army NCO Promotion Point Worksheet (PPW) and Personnel File Forms used in past research
(e.g., NCO21, Project A). The S21-PFF will contain sections that assess Soldiers’ standing on (a)
Awards, Certificates, and Military Achievements; (b) Military Education; (c) Civilian
Education; and (d) Military Training. Points for these areas are allocated by the personnel
system based on Soldiers’ records.

Questions that assess promotion board points (available on the Project A PFF) and
commander’s evaluation points are also being considered for inclusion in the S21-PFF. The S21-
PFF will also ask Soldiers to indicate the number of Article 15s and Flag Actions they have
received. Data on these disciplinary actions will be particularly useful as criteria for the
temperament and P-E fit predictors. We will also generate a list of operational tests, training
experiences, and other potential criterion indicators that do not appear on previous PFFs, but that
might yield useful information. This list will be used as a source of additional items, some of
which are likely to be MOS-specific. Project staff will then review and comment on this initial
measure. Lists of awards, military education, and civilian experiences will be carefully
scrutinized to ensure that they are all covered by the current PPW, and to determine whether any
awards or experiences are missing. Based on feedback from project staff, appropriate additions,
deletions, or modifications will be made prior to the field test. Although not collected via self-
report, we will also calculate an archival measure of promotion rate. This variable was used
successfully in Project A as a supplemental job performance indicator.

NEXT STEPS

Due to deployments, there has been limited opportunity to collect data on the criterion
measures. The next opportunity for large-scale administration of the criterion measures will be in
the 2004 field test, which is likely to occur in late spring or summer of 2004. All the criterion
measures will be finalized for administration in the concurrent validation at that time.

In January, we will ask small groups of field NCOs to review the OPS and FX scales.
This will be the first exposure of the scales to NCOs who are not instructors, so their point of
view is extremely useful, as are there ideas about how to best conduct “distance ratings.” After
this mini pilot test, we will finalize the scales for the 2004 field test. The field test will be the
first large-scale administration of the rating scales. This will provide us the opportunity to see
how well our efforts to focus raters on using the scales as a standard have worked. It will also

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
530

allow us to compare response rates and ratings from paper-and-pencil leave-behind packets and
ratings administered via the Internet.

Job knowledge test development is continuing at several installations in the fall/winter of


2003. We will incorporate these new and revised items into item bank. In January we will
finalize the instruments for field testing and develop a background information questionnaire.
This questionnaire will ask Soldiers whether they have been trained on each of the tasks and how
recently they have performed the task. In addition, it will ask Soldiers the unit to which they are
assigned and what equipment they use. This information is critical to allowing us to tailor the
tests for question tracking during the MOS portion of the exam. We will administer overlength
exams in the field test, which will allow us to gather data on all the items in the test bank. We
will use the field test data to revise the tests for the concurrent validation, selecting the best set of
items for each test.

REFERENCES

Campbell, J.P., & Knapp, D.J. (Eds.) (2001). Exploring the limits in personnel selection and
classification. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Knapp, D.J., Burnfield, J.L., Sager, C.E., Waugh, G.W., Campbell, J.P., Reeve, C.L., Campbell,
R.C., White, L.A., & Heffner, T.S. (2002). Development of predictor and criterion
measures for the NCO21 research program (Technical Report 1128). Alexandria, VA:
U.S. Army Research Institute for the Behavioral and Social Sciences.

Sager, C.E., Russell, T.L., Campbell, R.C., & Ford, L.A. (2003). Future Soldiers: Analysis of
entry-level performance requirements and their predictors. (Draft Technical Report).
Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.

Sager, C.E., & Russell, T.L. (2003). Future-oriented job analysis for first-tour soldiers. In D. J.
Knapp (Chair), Selecting Soldiers for the Future Force: The Army’s Select21 Project.
Symposium conducted at the 2003 International Military Testing Association (IMTA)
Conference, Pensacola, FL.

Van Iddekinge, C., Putka, D., & Sager, C.E. (2003). Assessing person-environment (p-e) fit with
the future Army. In D. J. Knapp (Chair), Selecting Soldiers for the Future Force: The
Army’s Select21 Project. Symposium conducted at the 2003 International Military
Testing Association (IMTA) Conference, Pensacola, FL.

Waugh, G.W., & Russell, T.R. (2003). In D. J. Knapp (Chair), Selecting Soldiers for the Future
Force: The Army’s Select21 Project. Symposium conducted at the 2003 International
Military Testing Association (IMTA) Conference, Pensacola, FL.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
531

DEVELOPING OPERATIONAL PERSONALITY ASSESSMENTS:


STRATEGIES FOR FORCED-CHOICE AND BIODATA-BASED
MEASURES44

Rodney A. McCloy, Ph.D., Dan J. Putka, Ph.D., and Chad H. Van Iddekinge, Ph.D.
Human Resources Research Organization (HumRRO)
Alexandria, VA, USA
rmccloy@humrro.org

Robert N. Kilcullen, Ph.D.


U.S. Army Research Institute for the Behavioral and Social Sciences
Alexandria, VA, USA

BACKGROUND

The U.S. Army is undertaking fundamental changes to transform into the Future Force.
The Select21 project concerns future entry-level Soldiers selection, with the goal of ensuring the
Army selects and classifies Soldiers with the knowledge, skills, and attributes (KSAs) needed for
performing the types of tasks envisioned in a transformed Army. The ultimate objectives of the
project are to (a) develop and validate measures of critical attributes needed for successful
execution of Future Force missions and (b) propose use of the measures as a foundation for an
entry-level selection and classification system adapted to the demands of the 21st century. The
Select21 project focuses on the period of transformation to the Future Force—a transition
envisioned to take approximately 30 years to complete. The time frame of interest extends to
approximately 2025.

The major elements of our approach to this project are (a) future-oriented job analysis,
(b) development of KSA/predictor measures, (c) development of criterion measures, and (d)
concurrent criterion-related validation. The future-oriented job analysis provides the foundation
for development of new tests that could be used for recruit selection or Military Occupational
Specialty (MOS) assignment (i.e., predictors) and development of job performance measures that
will serve as criteria for evaluating the predictors. After field testing the predictor and criterion
instruments, we will evaluate the potential usefulness of the predictors by comparing Soldiers’
scores on the predictor measures to their scores on criterion performance measures in a
concurrent criterion-related validation effort.

44
In D. J. Knapp (Chair), Selecting Soldiers for the Future Force: The Army’s Select21 Project. Symposium
conducted at the 2003 International Military Testing Association (IMTA) Conference, Pensacola, FL. The views,
opinions, and/or findings contained in this paper are those of the authors and should not be construed as an official
U.S. Department of the Army position, policy, or decision.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
532

The Select21 job analysis team reviewed multiple sources to identify relevant KSAs,
including the Basic Combat Training list, Project A KSAs, NCO21 KSAs, Soldier21, and several
other sources. This activity resulted in a list of 48 KSAs relevant to performance of first-tour
Soldiers in the Future Force (Sager & Russell, 2003). Twelve of these entry-level KSAs fall
under the heading of temperament and serve as the target constructs for the Select21
temperament measures:

• Team Orientation
• Agreeableness
• Cultural Tolerance
• Social Perceptiveness
• Achievement Motivation
• Self-Reliance
• Affiliation
• Potency
• Dependability
• Locus of Control
• Intellectance
• Emotional Stability.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
533

In this paper, we focus on two measures: the Person-Organization-Personality (POP) Hybrid


(also known as the Work Suitability Inventory) and the Rational Biodata Inventory (RBI). The
discussion highlights design characteristics of the POP Hybrid and RBI that we believe will
preserve their utility in operational selection settings.

PERSON-ORGANIZATION-PERSONALITY (POP) HYBRID

Researchers generally agree that people can fake self-report personality assessments (Hough,
Eaton, Dunnette, Kamp, & McCloy, 1990; Ones, Viswesvaran, & Korbin, 1995) and that many will
do so in operational selection settings (Hough, 1996, 1997, 1998; Rosse, Stechler, Miller, & Levin,
1998). Researchers disagree, however, regarding the extent to which faking affects the criterion-
related validity of these assessments. Although many researchers have found that faking has little or
no effect on criterion-related validity estimates (e.g., Barrick & Mount, 1996; Hough et al., 1990;
Ones, Viswesvaran, & Reiss, 1996), other evidence suggests faking does change the rank-order of
applicants in the upper tail of the distribution and results in the selection of individuals with lower-
than-expected performance scores (Mueller-Hanson, Heggestad, & Thornton, 2003; Zickar, 2000).
Given our experience with the Army’s Assessment of Individual Motivation (AIM; Knapp, Waters,
& Heggestad, 2002), we believe that response distortion poses a dauntingly high hurdle to the
personnel selection specialist interested in using temperament measures in an operational setting.

Recent efforts to mitigate response distortion have centered on forced-choice formats.


Although forced-choice formats have demonstrated capacity to reduce the effects of faking
(Jackson, Wrobleski, & Ashton, 2000; White & Young, 1998; Wright & Miederhoff, 1999), they
carry the stigma of ipsative response data (Hicks, 1970). One approach to reducing the ipsativity
of a forced-choice measure involves introducing foil (i.e., dummy) constructs—constructs we do
not wish to score. This approach reduces ipsativity in the responses because one can now score
relatively high or relatively low on all keyed constructs (when they are paired only with dummy
constructs). Some ipsativity remains, however, because the forced-choice response depends upon
the respondent’s standing on the keyed and dummy traits in each pair. Thus, although ipsativity
fades, it does not exit the stage entirely. Furthermore, we hypothesize that one will likely attain
better approximations of normative trait standings to the extent that one more fully samples the
content space of interest (here, personality traits). We designed the POP Hybrid as a forced-
choice measure with these characteristics, which we hypothesize will enhance its ability to
provide estimates of respondents’ normative standing on the targeted traits.

Development

The POP Hybrid comprises 16 statements (stems) that describe temperament-related work
requirements (e.g, work that requires being showing a cooperative attitude). The statements are
based on the Work Styles portion of the O*NET content model (Borman, Kubisiak, & Schneider,
1999), although we have simplified their wording to make them more accessible to entry-level
Soldiers. Given that the O*Net work styles taxonomy was designed to cover the entire domain of
personality, it provides good coverage of the Select21 temperament-related KSAs with the
exception of Locus of Control and Cultural Tolerance, which are not typically included in
personality taxonomies (see Sager & Russell (2003) for a review of Select21 KSAs).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
534

Several factors led to the decision to base the POP Hybrid content on the O*NET Work
Styles. First, the taxonomy lends itself to the formation of commensurate measures for assessing
person-environment (P-E) fit. For example, the content allows one to construct both person-side
(ability) and Army-side (demand) measures. Furthermore, the breadth of coverage of the taxonomy
helps ensure coverage of the range of work-related personality traits/characteristics an applicant
might have, which is an important characteristic of P-E fit measures (Putka, Van Iddekinge, &
Sager, 2003). Third, working from the O*NET Work Styles model provides the POP Hybrid with
a defensible taxonomic base upon which to argue that the stems from target traits appear with an
appropriate set of dummy traits, which as we note above, is arguably an important characteristic of
forced-choice measures. Finally, as a deterrent to prevent respondents from distorting their
answers, all stems are socially desirable.

The POP Hybrid attempts to distract Soldiers from thinking about how best to game their
answers to a temperament assessment by redirecting their thoughts toward P-E fit. For example,
the initial version of the POP Hybrid contained more than 100 paired-comparison items.
Respondents selected the one statement out of each pair that described the type of work they
believed they “would be more successful at.” Not surprisingly, Soldiers reacted quite negatively to
the redundancy of the measure and sheer drudgery of the exercise. In addition, the measure
required an inordinate amount of administration time (approximately 45 minutes). We therefore
put this version aside in favor of an alternative response format.

The POP Hybrid now takes the form of a card-sorting task, with each of the 16 cards
containing one of the work characteristic statements (Fig. 1 presents 3 of the 16 statements). The
instructions direct the respondents to “sort the 16 cards in terms of how well you think you would
perform the type of work described by the cards. Cards containing types of work that you think
you would perform best should be ranked highest; cards containing types of work that you think
you would perform worst should be ranked lowest.”

Work that requires…showing a cooperative and friendly attitude towards others I dislike or disagree with.
(Agreeableness)
Work that requires…setting challenging goals and working continuously to attain them. (Achievement
Motivation)
Work that requires…interacting with people of different cultures and backgrounds, and appreciating differences
in their values, opinions, and beliefs (Cultural Tolerance)
Note. The target Select21 KSA appears in parentheses following each POP Hybrid stem.

Figure 1. Sample POP Hybrid stems.

Scoring

One benefit of the POP Hybrid is that it can be scored in several ways, depending on
whether we want to use it for traditional personality assessment applications or more traditional
P-E fit applications. For example, two options we are considering are:

(1) Scoring target constructs only (traditional personality assessment):

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
535

Here, we score only those statements selected as target constructs; the remaining
statements serve as “foil” (dummy) constructs (i.e., constructs we choose not to
score). Considering only these statements reduces score ipsativity, with the
reduction inversely proportional to the number of target constructs. Scores for the
target constructs are a function of their ranking relative to the foil constructs
only—not each other. Therefore, respondents can receive equal scores on the
target constructs (e.g., two target constructs are each ranked higher than any of the
foils). Were the data totally ipsative, different traits could not receive the same
score; thus, the data are only partially ipsative, thereby improving their statistical
characteristics.

(2) Scoring all constructs (P-E fit applications):

As noted earlier, we can also use this information to assess P-E fit with regard to
the temperament-related characteristics applicants possess and the temperament-
related demands of the Army. For example, if we administer an instrument similar
to the POP Hybrid to Army SMEs with different instructions (e.g., rank the
statements in terms of which they are descriptive of work that is critical to
perform well as first-tour Soldier), we would have an Army-side profile to which
we could compare applicant responses. With an applicant and an Army-side
profile, numerous scoring options might be investigated. For example, such
information could be used to generate rank-order correlations with Army-side rank
orderings to generate profile similarity indices.

Other Development Considerations

In addition to the steps described above to make the POP Hybrid more resistant to faking,
other characteristics may help the measure maintain its validity in an operational setting. For
example, to the extent respondents try to distort the rank ordering of stems to match the ideal
personality for the Army, such distortion may not detract from the criterion-related validity of
the resulting score. Indeed, this particular form of distortion would indicate familiarity with the
requirements of the Army and realistic expectations with regard to what Army work requires.
The literature on realistic job previews suggests that familiarity with the job (or in this case the
Army) and realistic expectations would contribute to criterion-related validity when predicting
alternative criteria such as job satisfaction and attrition (Wanous, 1992). Thus, although this type
of response distortion represents a source of contamination in POP Hybrid scores, it could very
well serve as criterion-related contamination and thus enhance criterion-related validity.

In addition, the design of the POP Hybrid allows us to select which constructs to key and
which to treat as foils, depending on the criteria of interest. Thus, for criterion Y1,
Achievement/Effort, Energy, and Leadership Orientation might serve as the keyed traits, with the
other 13 traits serving as dummies. Criterion Y2, on the other hand, might require Innovation,
Analytic Thinking, Stress Tolerance, and Energy as the keyed traits. This flexibility in how we
treat constructs contained on the POP Hybrid has great value for two additional reasons. First,
the Army often desires to use the same instrument to predict a variety of criteria (e.g., using AIM
to predict NCO performance, recruiter performance and first-tour attrition). Second, to the extent
that we can convince respondents completing the POP Hybrid that the Army will use results for a

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
536

variety of purposes (thus another reason for covering the domain of personality), it may prevent
them from attempting to fake toward a given profile or in a certain direction.

RATIONAL BIODATA INVENTORY (RBI)

During initial project meetings, there was a recommendation to use the Test of Adaptable
Personality (TAP), a 20-minute biodata assessment. Because the TAP has demonstrated
criterion-related validity with can-do criteria in operational use for Special Forces Soldiers
(Kilcullen, Goodwin, Chen, Wisecarver, & Sanders, 2002), we hypothesized the TAP (or another
measure like it) might hold substantial promise as a selection measure for entry-level Soldiers.

Over time, we selected only certain TAP scales, supplementing these with scales from the
Biographical Information Questionnaire (BIQ)—a conglomerate biodata instrument administered
during the NCO21 project (cf. Putka, Kilcullen, & White, 2003) and comprising ARI’s
Assessment of Right Conduct (ARC). The resulting measure—the Rational Biodata Inventory
(RBI)—provides a largely biodata-driven assessment that we believe holds promise for
operational use.

We continue to work on the RBI. We anticipate it potentially assessing all 12 Select21


temperament-related KSAs (Sager & Russell, 2003). In many ways the RBI will complement the
POP Hybrid, either assessing traits that the POP Hybrid does not, or assessing different facets of
the traits the POP Hybrid does assess. For example, the KSA Locus of Control does not lend
itself to assessment through the POP Hybrid method, as the trait concerns internal attributions
rather than characteristics of work environments. The RBI, however, can readily assess an
individual’s standing on this construct. Other constructs, such as Dependability, are arguably too
broad to assess completely with a single statement. The Dependability stem on the POP Hybrid
concerns the degree to which the respondent meets obligations and completes duties on time. We
plan on the RBI rounding out our assessment of this heterogeneous construct by incorporating
scales such as Hostility to Authority (negatively loading on the facet of respect for authority) and
Social Maturity (loading on facets of conformity and compliance).

NEXT STEPS

The Select21 temperament measures offer substantial promise for predicting the typical
job performance of Future Force Soldiers. To realize the promise, however, we must
satisfactorily address several issues.

Given the Select21 data collection will incorporate concurrent validation in a research-
only setting, assessing the performance of the personality measures in an operational setting
appears of paramount importance. Our past research has demonstrated that responses obtained
under faking instructions might not approximate well those responses obtained during
operational use of the measure (Knapp et al., 2002)—the nature of dissimulation varied from
research to operational setting. If at all possible, some form of operational tryout of the
personality measure(s) seems imperative.

Failing an operational tryout, we believe a carefully designed investigation of


faking/coaching that mirrors the operational selection setting closely would be critical to

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
537

understanding how applicants might alter their responses compared to research participants.
Therefore, we propose to conduct such an investigation during the pilot test of the POP Hybrid
and RBI that will link performance on the measures to some set of desired outcomes (what these
will be, exactly, remains unclear). Again, we believe such a tryout to have special import for the
RBI, given its prior use in operational settings with experienced Soldiers.

We are considering several options for simulating operational selection setting. One
option entails administration of the measures with incentives given for “correct” responding (i.e.,
responses that look like the ideal candidate without looking overly suspect). We would compare
statistics for the measures from this condition to statistics from an honest-responding condition.
Although we would have no criterion data, any changes in respondents’ rank order would raise
flags about use of the measure(s) in an operational setting.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
538

REFERENCES

Barrick, M.R., & Mount, M.K. (1996). Effects of impression management and self-deception on the
predictive validity of personality constructs. Journal of Applied Psychology, 81, 261-272.

Borman, W.C., Kubisiak, U.C., & Schneider, R.J. (1999). Work styles. In N.G. Peterson, M.D.
Mumford, W.C. Borman, P.R. Jeanneret, & E.A. Fleishman (Eds.), An occupational
information system for the 21st century: The development of O*NET (pp.213-226).
Washington, DC: American Psychological Association.

Hicks, L.E. (1970). Some properties of ipsative, normative, and forced-choice normative
measures. Psychological Bulletin, 74, 167-184.

Hough, L.M. (1996). Personality measurement and personnel selection: Implementation issues.
Paper presented at the 11th annual meeting of the Society of Industrial and Organizational
Psychology, San Diego, CA.

Hough, L.M. (1997). Issues and evidence: Use of personality variables for predicting job
performance. Paper presented at the 12th annual meeting of the Society of Industrial and
Organizational Psychology, St. Louis, MO.

Hough, L.M. (1998). Effects of intentional distortion in personality measurement and evaluation
of suggested palliatives. Human Performance, 11, 209-244.

Hough, L.M., Eaton, N.K., Dunnette, M.D., Kamp, J.D., & McCloy, R.A. (1990). Criterion-
related validities of personality constructs and the effect of response distortion on those
validities. Journal of Applied Psychology, 75, 581-595.

Jackson, D.N., Wrobleski, V.R., & Ashton, M.C. (2000). The impact of faking on employment
tests: Does forced-choice offer a solution? Human Performance, 13, 371-388.

Kilcullen, R., Goodwin, J., Chen, G., Wisecarver, M., & Sanders, M. (2002). Identifying agile
and versatile officers to serve in the Objective Force. Paper presented at the 23rd Annual
Army Science Conference, Orlando, FL.

Knapp, D.J., Waters, B.K., & Heggestad, E.D. (Eds.) (2002). Investigations related to the
implementation of the Assessment of Individual Motivation (AIM) (Study Note 2002-02).
Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.

Mueller-Hanson, R., Heggestad, E.D., & Thornton, G.C., III (2003). Faking and selection:
Considering the use of personality from a select-in and a select-out perspective. Journal
of Applied Psychology, 88, 348-355.

Ones, D.S., Viswesvaran, C., & Reiss, A.D. (1996). Role of social desirability in personality testing
for personnel selection: The red herring. Journal of Applied Psychology, 81, 660-679.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
539

Putka, D.J., Kilcullen, R. N., & White, L.A. (2003). Temperament inventories. In D. J. Knapp, R.
A. McCloy, & T. S. Heffner (Eds.), Validation of measures designed to maximize 21st-
century Army NCO performance (Interim Report) (pp. 8-1 – 8-42). Alexandria, VA:
Human Resources Research Organization.

Rosse, J.G., Stecher, M.D., Miller, J.L., & Levin, R. (1998). The impact of response distortion on
pre-employment personality testing and hiring decisions. Journal of Applied Psychology,
83, 634-644.

Sager, C.E. & Russell, T.L. (2003, November). Future-oriented job analysis for first-tour
soldiers. In D.J. Knapp (Chair), Selecting Soldiers for the Future Force: The Army’s
Select21 Project. Paper presented at the 2003 International Military Testing Association
Conference, Pensacola, FL.

Putka, D.J., Van Iddekinge, C.H., & Sager, C.E. (2003, November). Developing measures of
occupational interests and values for selection. In MG. Rumsey (Chair), Occupational
Interest Measurement: Where Are the Services Headed? Paper presented at the 2003
International Military Testing Association Conference, Pensacola, FL.

Wanous, J.P. (1992). Organizational entry (2nd Ed.), Reading, MA: Addison-Wesley.

White, L.A., & Young, M.C. (1998). Development and validation of the Assessment of
Individual Motivation (AIM). Paper presented at the Annual Meeting of the American
Psychological Association, San Francisco, CA.

Wright, S.S., & Miederhoff, P.A. (1999). Selecting students with personal characteristics relevant to
pharmaceutical care. American Journal of Pharmaceutical Education, 63, 132-138.

Zickar, M.J. (2000). Modeling faking on personality tests. In D. Ilgen & C.L. Hulin (Eds.),
Computational modeling of behavioral processes in organizations (pp. 95-108).
Washington, DC: American Psychological Association.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
540

SCORING BOTH JUDGMENT AND PERSONALITY


IN A SITUATIONAL JUDGMENT TEST45

Gordon W. Waugh, Ph.D. and Teresa L. Russell, Ph.D.


Human Resources Research Organization
Alexandria, VA, USA
gwaugh@humrro.org

INTRODUCTION

Although personality measures are good predictors of performance in a research setting


(Tett, Jackson, & Rothstein, 1991), there are problems with their use in operational settings.
(Knapp, Waters, & Heggestad, 2002). There is substantial research showing that personality tests
can be faked (Hough, Eaton, Dunnette, Kamp, & McCloy, 1990; Ones, Viswesvaran, & Korbin,
1995). Several recent studies show that they probably are faked when used for personnel
selection (Hough, 1996, 1997, 1998; Rosse, Stechler, Miller, & Levin, 1998). Faking changes the
rank-order of applicants and results in the selection of individuals with lower-than-expected
performance scores (Mueller-Hanson, Heggestad, & Thornton, 2003; Zickar, 2000). Thus, there
is much interest in developing a faking-resistant personality measure.

This presentation describes the development of a situational judgment test (SJT) for
selection into the U.S. Army. A situational judgment test item consists of a description of a
problem situation followed by several possible actions. An examinee answers the item by
judging the effectiveness of the actions. In some SJTs, the examinee indicates the best and worst
actions. In other SJTs, including this SJT, the examinee rates the effectiveness of each action.

A criterion SJT was simultaneously developed which targeted soldiers who had been in
the Army between 18 and 36 months. It was developed in the same manner as the predictor SJT
described in this paper, with two exceptions: it includes only military scenarios and it does not
use trait scoring. The criterion SJT, along with other criterion measures, will be used to collect
validity data on other predictor measures. The predictor and criterion SJTs were developed for
Select21, a project sponsored by the U.S. Army Research Institute for the Behavioral and Social
Sciences. The objective of Select21 is to develop and validate selection measures that will help the
Army select, classify, and retain enlisted Soldiers with the characteristics needed to succeed in the
future Army.

The SJT format has two characteristics that might make it possible to developing faking-
resistant personality tests. First, in contrast to traditional personality tests, SJT examinees are not
asked to divulge anything about themselves. Rather, examinees are asked to analyze each
situation and evaluate the effectiveness of each action. Thus, the SJT would be a subtle measure

45
In D. J. Knapp (Chair), Selecting Soldiers for the Future Force: The Army’s Select21 Project. Symposium
conducted at the 2003 International Military Testing Association (IMTA) Conference, Pensacola, FL. The views,
opinions, and/or findings contained in this paper are those of the authors and should not be construed as an official
U.S. Department of the Army position, policy, or decision.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
541

of personality. Second, the same responses would be used to generate both an effectiveness score
and personality scores. Examinees who try to maximize their personality scores would risk
lowering their effectiveness scores. That is, examinees cannot ignore the effectiveness of any
actions when answering test items. Thus, an SJT that produces both an effectiveness score and
personality scale scores might be able to eliminate—or considerably reduce—faking.

Several personality traits were identified that are relevant to performing well as a Soldier
in the U.S. Army (Sager & Russell, 2003). We had developed a situational judgment test in
recent previous research for the Army (Knapp et al., 2002). The SJT score correlated
significantly not only with performance measures but also with several personality scales.
Unfortunately, attempts to develop scales based on item scores were unsuccessful. This was not
surprising considering that SJTs tend to be heterogeneous even at the item level. Thus, in the
current approach, we are developing personality scales based on the scores for individual actions.

This effort has two notable aspects. First, we generated parallel civilian situations from
military situations. Second, the test simultaneously measures both the respondent’s judgment and
several of the respondent’s personality traits. We developed descriptions of military situations
that a Soldier might experience during the first few months in the Army. For each situation, we
developed response options comprising actions that Soldiers might take in those situations. We
feared, however, that a test consisting of military situations might not be appropriate for civilian
applicants. Many applicants might not understand the situations, and those that had some
military knowledge might have an unfair advantage on the test. Therefore, we developed a
parallel civilian situation for most military situations. The remainder of this paper provides more
detailed descriptions of the development of civilian items and trait scales. The results of a recent
pilot test will also be discussed.

DEVELOPMENT OF THE MILITARY AND CIVILIAN SJTS

To develop the military SJT, we asked new Soldiers and the NCOs (Non-Commissioned
Officers) who train them to write descriptions of problem situations relevant to new Soldiers
(during the first few months in the Army). These could be actual situations they had observed or
hypothetical situations. Other Soldiers and NCOs wrote descriptions of actions that Soldiers
might take in these situations. We edited and dropped actions until there were no more than
about nine actions per situation. We asked Soldiers and NCOs to write each situation targeted to
one of the specific predictor constructs we wanted the SJT to measure.

At this point, we asked NCOs and Soldiers to write parallel civilian situations based on
the military situations. After editing these situations, we picked the best parallel civilian situation
for each military situation. Then we asked NCOs and Soldiers to write actions for the civilian
situations. We edited these actions and reduced the number of actions per situation to about nine.

We developed an initial scoring key using 10 NCOs (drill instructors and other trainers of
new Soldiers). Then we gave the draft test items (military and civilian items) to Soldiers in
training. Each item was answered by about 12 Soldiers. Based on these two data collections, some
options were dropped and a few others were edited or added. An option was dropped if the NCOs
disagreed substantially on an option’s effectiveness. Typically, options with standard deviations
above 2.00 (on a 7-point scale) were dropped. In contrast, options were also dropped if there was

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
542

too much agreement among the Soldiers in training. Finally, we narrowed down the options to
about seven per situation. Where possible, a set of options were selected for a situation so that there
was a wide range of keyed effectiveness values (computed as the mean of the NCO ratings).

DEVELOPMENT OF THE TRAIT SCORING

The theoretical basis for using the SJT to measure both traits and judgment is based upon
the following model. When an examinee judges the effectiveness of an action, that judgment is
determined by both the examinee’s personality and his/her knowledge/training/experience relevant
to the situation. The traditional SJT score taps the examinee’s knowledge/training/experience
whereas the trait scores tap part of the examinee’s personality.

As mentioned above, SJT tests are heterogeneous. Therefore, we decided to measure


traits at the lowest level possible: the individual option. Nineteen individuals with graduate
degrees in industrial-organizational psychology were recruited to rate the traitedness of each
response option. Each response option was rated by five to seven psychologists. For each trait-
option combination, participants rated the degree to which the action and trait were related.
Inverse relationships were given negative ratings. Each point in the rating scale represented a
range of correlations. The mean (across psychologists) rating represented the traitedness for that
trait on that option.

PILOT TEST RESULTS: JUDGMENT SCORES

Eight draft test forms were given to 319 Soldiers in U.S. Army reception battalions.
These Soldiers had just entered the Army but had not been assigned to training yet. Therefore,
they would be similar to applicants. Each Soldier completed one civilian SJT form (A–D) and
one military SJT form (1–4). There were four pairings of forms: A-1, B-2, C-3, and D-4. Within
each form-pair, the order was randomized. That is, half of the Soldiers got the military form first;
the other half got the civilian form first. Most items had seven response options. The civilian
forms had 14–16 items; the military forms had 11–13 items. There was no attempt to put a
military item and its parallel civilian item within the same form-pair.

The Soldiers responded by rating the effectiveness of each option on a 7-point scale
(where higher numbers represent greater effectiveness). The judgment score for an option was
computed as shown in Equation 1 below. The difference between the rating and keyed
effectiveness values is subtracted from 6 so that higher values represent better scores. The
judgment score for an entire test form was merely the mean of the option scores.

optionEffectivenessScore = 6 – | SoldiersRating – keyedEffectiveness | (1)

The reliability of the judgment scores was estimated via coefficient alpha. Table 1 shows
these values for each of the eight forms. The reliability estimates are around .90. Table 1 also
shows that the judgment score is measuring essentially the same thing on the civilian and
military forms. The correlations between forms are almost as high as the reliability estimates.
The correlation rc estimates the correlation between the constructs measured in the two forms.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
543

Table 1. Correlations between Civilian and Military Forms and Reliability Estimates for
Judgment Scores
Form Pair Coefficient Alpha
(Civ, Mil) r rc Civ Mil
A, 1 .76 .83 .92 .92
B, 2 .85 .95 .91 .88
C, 3 .70 .83 .87 .83
D, 4 .82 .90 .91 .90
Note. N = 79. Each Soldier completed only one form pair. rc is corrected for
attenuation due to unreliability. All correlation are significant at p < .0001.

Table 2. Correlations between Judgment Score and SD of Ratings


Form-Pair
Scale A,1 B,2 C,3 D,4
Military -.69 -.77 -.60 -.58
Forms
Civilian -.66 -.77 -.61 -.58
Forms
Note. N = 79. Each Soldier completed only one form-pair.
All correlations are significant at p < .0001. Each value is
the correlation between the judgment scores and the within-
examinee standard deviation of his/her ratings.

Some other SJTs use the judgment scoring algorithm used in this research. Table 2 shows
a possible danger with this algorithm. The variability of an examinee’s scores are highly
correlated (in a negative direction) with the judgment scores. This relationship exists when the
keyed effectiveness values tend to be near the middle of the rating scale—as is the case with this
SJT. An examinee can get a fairly good score by simply rating every option a 4 (the middle of
the rating scale). This problem can be eliminated by either (a) designing a test with a uniform
distribution of keyed effectiveness values or (b) asking examinees to pick or rank rather than rate
responses. The problem can be reduced during the computation of the scores by moving values
near the top and bottom of the scale towards the center (for both the examinee’s ratings and the
effectiveness key). For example, rating values below 2.5 could be changed to 2.5.

PILOT TEST RESULTS: TRAIT SCORES

The score on a trait for a specific option was computed as shown in Equation 2 below. As
shown, the keyed effectiveness value was subtracted from the Soldier’s rating to set the scales
metric so that a Soldier would receive a trait score of zero on an option if his/her rating equaled
the keyed effectiveness value. Trait scores can be positive or negative. The trait score for an
entire form is the mean of the trait scores among the options linked to the trait. Because Trait 8,
intellectance, was linked to very few options, it was dropped from the analyses.

optionTraitScore = (SoldiersRating – keyedEffectiveness) * traitedness (2)

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
544

Table 3 shows the reliability estimates by form and trait. Part of the variability in the
reliability estimates is due to wide range of options per trait which ranges from 3 to 35. Table 4
provides an easier way to compare traits by basing each reliability estimate on a 20-option trait
scale. There is considerable variability in the reliability estimates across forms: the values range
from .27 to .88. Thus, it appears possible to construct an SJT test that measures some traits
reliably. For example, in Form D, the reliability estimate is above .80 for five of the seven traits.

Table 3. Reliability Estimates of Trait Scales


Military Forms Civilian Forms
Trait Name 1 2 3 4 A B C D
1 Achievement .60 .63 .65 .78 .77 .60 .84 .88
Orientation
2 Self-Reliance .42 .55 .77 .67 .63 .60 .74 .85
3 Dependability .77 .76 .75 .72 .74 .76 .87 .88

4 Sociability .74 .35 .74 .48 .81 .28 .75 .84


5 Agreeableness .47 .53 .24 .46 .73 .27 .64 .61
6 Social Perceptiveness .67 .30 .44 .45 .52 .34 .66 .35
7 Team Orientation .82 .38 .52 .74 .79 .54 .81 .81
Note. N = 79. Each Soldier completed only one form-pair. The number of options per scale ranges
from 3 (Agreeableness Form 3) to 35 (Dependability Form 1).

It appears that Agreeableness and Social Perceptiveness are not measured as reliably as
the other traits. This is due partly to a lack of options linked to these traits, but Table 4 shows
that these two traits are not measured quite as reliably even when their scales have 20 options.

The SJT was not administered with other instruments. Thus, its construct validity could
not be assessed by looking at its relationships with performance or personality measures. Its
construct validity was examined by looking at the relationships among the trait scales.

Table 4. Reliability Estimates for Hypothetical 20-Option Trait Scales


Military Forms Civilian Forms
Trait Name 1 2 3 4 A B C D
1 Achievement .54 .60 .77 .75 .73 .60 .76 .82
Orientation
2 Self-Reliance .41 .53 .83 .86 .59 .65 .65 .78
3 Dependability .66 .68 .71 .70 .75 .68 .80 .81

4 Sociability .83 .64 .85 .65 .83 .50 .73 .81


5 Agreeableness .64 .82 .68 .66 .67 .28 .54 .60
6 Social Perceptiveness .79 .68 .66 .70 .63 .60 .65 .36
7 Team Orientation .73 .48 .63 .76 .74 .59 .77 .71
Note. N = 79. Each Soldier completed only one form-pair.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
545

The latent structure underlying the seven trait scales was examined using exploratory
factor analysis. Several analyses were examined using both oblique and orthogonal rotation.
Parallel analyses clearly indicated a three-factor solution. In the parallel analyses the scree plot
of the real data (correlation matrix of the seven traits with multiple-squared correlations in the
diagonal) was compared with the scree plots of 100 random datasets. In the vast majority of
cases, the plots crossed between the third and fourth factors. The orthogonal model was more
interpretable than the oblique model. Table 5 shows the factor loadings for the orthogonal model.
The traits loading highly on the first factor are related to accomplishing tasks independently.
Factors 2 and 3 are related to interacting with people. Factor 2 appears to involve working with
other people to accomplish tasks. Factor 3 appears to be almost equivalent to the Agreeableness
trait. The major elements of the Agreeableness definition are likeability and pleasantness.

Two traits have large loadings on two factors. Team Orientation loads almost equally on
Factors 2 and 3. The definition of Team Orientation has two facets: team member cohesion/bonding
and working together as a team. This definition is consistent with the factor loadings. Dependability
is a bit more difficult to explain. Its loading on Factor 2 was somewhat surprising. One could
consider, however, a team can perform well only when its members can depend upon each other.

Finally, the relationship between judgment scores and trait scores was examined. Table 6
shows that the judgment scale correlates moderately with some trait scales in some forms. The
vast majority of correlations are negative. An examinee whose ratings stay within the middle of
the rating scale cannot achieve high trait scores. This is the reverse of what we say with the
judgment scores. Thus, these negative correlations are likely related to the same phenomenon.
They would likely be eliminated or reduced dramatically if the examinees were to rank (or pick)
rather than rate the options. Even using the present scoring method, Table 6 shows that the trait
scores are measuring something different from the judgment score.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
546

Table 5. Trait Score Factor Loadings for Three-Factor Model


Factor Loadings
Trait Name Factor 1 Factor 2 Factor 3
1 Achievement 0.93 0.15 0.26
Orientation
2 Self-Reliance 0.92 -0.16 0.08

3 Dependability 0.53 0.67 0.35


6 Social Perceptiveness -0.19 0.70 -0.07
4 Sociability 0.06 0.86 0.39

7 Team Orientation 0.33 0.66 0.63

5 Agreeableness 0.18 0.12 0.97


Note. N = 319. To obtain an acceptable sample size, test form was ignored. Loadings above .49 are boldfaced.
Loadings between .30 and .49 are italicized. Loadings above .29 that are not consistent with the interpretation of
the factors are in red. The three factors were interpreted as follows:
Factor 1: Motivation/Skill to accomplish tasks while working independently.
Factor 2: Motivation/Skill to accomplish tasks while working with other people.
Factor 3: Agreeableness - pleasantness and likeability.

Table 6. Correlations between Judgment and Trait Scores


Military Forms Civilian Forms
Trait Name 1 2 3 4 A B C D
1 Achievement -.29 -.14 -.07 -.27 -.21 -.29 -.14 -.24
Orientation
2 Self-Reliance -.30 -.33 -.31 -.35 -.05 -.31 -.14 -.31
3 Dependability -.29 -.24 -.16 -.21 .03 -.25 -.05 -.14

4 Sociability -.41 -.20 -.25 .11 -.13 -.27 -.10 -.16


5 Agreeableness -.37 .16 -.21 .13 .01 -.03 .09 -.03
6 Social Perceptiveness -.36 -.30 .25 .07 -.09 .02 .14 .03
7 Team Orientation -.28 -.21 -.20 -.02 -.04 -.09 .03 -.09
Note. N = 79. Each Soldier completed only one form-pair. Boldfaced correlations (i.e., > .22) are significant at
p < .05.

CONCLUSIONS

The results of this research show that a situational judgment test can be designed to
reliably measure personality traits. Although a factor analyses demonstrated some evidence of
construct validity, additional research has been planned to obtain stronger evidence. The SJT will
be administered with personality measures in the near future. Later it will be administered with
other personality measures as well as performance measures. The strength of using an SJT is
that, in theory, it is resistant to faking. Further research would be needed to determine this.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
547

There are many ways to score a situational judgment test and a few ways for examinees
to respond to items. Many of these ways have been compared with respect to judgment scores
but not with respect to the types of trait scores developed during this research effort. In
particular, ranking and rating should be compared.

The high correlations between the civilian and military test forms are reassuring. On the
one hand, one could argue that civilian forms do not have to be developed because the military
forms measure essentially the same thing. On the other hand, a few potentially good Soldiers
might be screened out because they knew little about the military—things they would learn soon
after joining the military. In that case, one could argue that civilian forms should be used.

REFERENCES

Hough, L. M. (1996). Personality measurement and personnel selection: Implementation issues.


Paper presented at the 11th annual meeting of the Society of Industrial and Organizational
Psychology, San Diego, CA.

Hough, L. M. (1997). Issues and evidence: Use of personality variables for predicting job
performance. Paper presented at the 12th annual meeting of the Society of Industrial and
Organizational Psychology, St. Louis, MO.

Hough, L. M. (1998). Effects of intentional distortion in personality measurement and evaluation


of suggested palliatives. Human Performance, 11, 209-244.

Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-
related validities of personality constructs and the effect of response distortion on those
validities. Journal of Applied Psychology, 75, 581-595.

Knapp, D. J., Burnfield, J. L., Sager, C. E., Waugh, G. W., Campbell, J. P., Reeve, C. L.,
Campbell, R. C., White, L. A., & Heffner, T. S. (2002). Development of predictor and
criterion measures for the NCO21 research program (Technical Report 1128).
Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.

Knapp, D. J., Waters, B. K., & Heggestad, E. D. (Eds.) (2002). Investigations related to the
implementation of the Assessment of Individual Motivation (AIM) (Study Note 2002-02).
Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.

Mueller-Hanson, R., Heggestad, E. D., & Thornton, G. C., III (2003). Faking and selection:
Considering the use of personality from a select-in and a select-out perspective. Journal
of Applied Psychology, 88, 348-355.

Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social desirability in personality
testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660-
679.

Rosse, J. G., Stecher, M. D., Miller, J. L., & Levin, R. (1998). The impact of response distortion
on pre-employment personality testing and hiring decisions. Journal of Applied
Psychology, 83, 634-644.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
548

Sager, C. E. & Russell, T. L. (2003, :November). Future-oriented job analysis for first-tour
Soldiers: Selecting Soldiers for the Future Force: The Army’s Select21 Project (D.J.
Knapp, Chair) at the 45h Annual Conference of the International Military Testing
Association, Pensacola, Florida.

Tett, R. P., Jackson, D. N, & Rothstein, M. (1991). Personality measures as predictors of job
performance: A meta-analytic review. Personnel Psychology, 41, pp. 703–742.

Zickar, M. J. (2000). Modeling faking on personality tests. In D. Ilgen & C. L. Hulin (Eds.),
Computational modeling of behavioral processes in organizations (pp. 95-108).
Washington, DC: American Psychological Association.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
549

ASSESSING PERSON-ENVIRONMENT (P-E) FIT


WITH THE FUTURE ARMY46

Chad H. Van Iddekinge, Ph.D., Dan J. Putka, Ph.D., and Christopher E. Sager, Ph.D.
Human Resources Research Organization
Alexandria, VA, USA
cvaniddekinge@hummro.org

INTRODUCTION

Personnel selection measures are typically designed to assess the knowledges, skills, and
attributes (KSAs) critical to performance in the job of interest. Although important, job
performance is not the only criterion of concern to most organizations. For example,
organizations like the U.S. Army are interested in reducing personnel attrition through their
selection and classification systems. Traditional KSA-based measures, however, seldom predict
both performance and alternative criteria such as attrition.

In recent years, personnel researchers have turned to measures of person-environment (P-


E) fit to predict outcomes other than job performance. Recent studies indicate that scores on such
measures are related to various work-related intentions and attitudes (e.g., job satisfaction,
organizational commitment, turnover intentions), as well as to actual behaviors such as
absenteeism and turnover (e.g., Cable & DeRue, 2002; Saks & Ashforth, 1997; Verquer, Beehr,
& Wagner, 2001). Although there is widespread interest in P-E fit within the civilian selection
literature, the use of fit measures has yet to be extensively reported in the military literature.

This paper attempts to address this gap by describing the development of P-E fit measures
for Select21, a project sponsored by the U.S. Army Research Institute for the Behavioral and
Social Sciences. The objective of Select21 is to develop and validate selection measures that will
help the Army select, classify, and retain enlisted Soldiers with the characteristics needed to
succeed in the future Army. The Select21 P-E fit measures we developed are intended to assess the
match between the work-related values and interests of prospective Soldiers and the
values/interests the Army provides first-tour Soldiers now and in the future. In this paper, we
describe a novel approach to developing P-E fit measures to predict Soldiers’ attitudes and career
decisions. We begin by describing the constructs measured in these instruments.

46
In D. J. Knapp (Chair), Selecting Soldiers for the Future Force: The Army’s Select21 Project. Symposium
conducted at the 2003 International Military Testing Association (IMTA) Conference, Pensacola, FL. The views,
opinions, and/or findings contained in this paper are those of the authors and should not be construed as an official
U.S. Department of the Army position, policy, or decision.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
550

CONSTRUCTS ASSESSED

As mentioned, selection measures are typically designed to assess KSAs critical to job
performance. When developing measures of P-E fit, however, the focus is on the needs, interests,
and/or expectations of job applicants. Given this, it is important to use a comprehensive
taxonomy to ensure these instruments measure the range of values and interests that the applicant
population might possess. The fit measures we are developing are designed to assess two sets of
constructs: work values and occupational interests. Work values are the most common constructs
assessed in the P-E fit measures (Verquer et al., 2001). The values we assessed were derived from
the Theory of Work Adjustment (TWA; Dawis, England, & Lofquist, 1964). According to the
TWA, job satisfaction is a function of the correspondence between workers’ preferences for
certain work-related values (e.g., having a chance to work independently, being paid well, having
good relations with co-workers) and the degree to which the job or organization supports those
values.

The work interests we focused on came from Holland’s (1978, 1996) congruence theory.
As with the TWA, this theory suggests that job satisfaction is a function of the congruence
between an individual’s work interests and the interests supported by his or her job (or
organization). According to Holland, vocational interests are expressions of personality that can
be used to categorize individuals and work environments into six types: realistic, investigative,
artistic, social, enterprising, and conventional (RIASEC). Holland’s model has been widely
validated and is the prevailing taxonomy in vocational psychology (Barrick, Mount, & Gupta,
2003).

We developed two sets of P-E fit measures. The first set of measures assessed the
congruence between Soldiers’ needs for certain values/interests and the values/interests the
Army supplies first-tour Soldiers. In the P-E fit literature, this is referred to as needs-supplies fit
(Edwards, 1991; Kristof, 1996). The second set of measures assessed the congruence between
the values/interests Soldiers expect the Army to provide and the values/interests the Army
actually provides. We refer to this as expectations-reality fit. As discussed later, we believe there
is a subtle yet important difference between the values/interests Soldiers prefer and the
values/interests they expect the Army to support. Because we used a similar process to develop
the values and interests measures, we limit our discussion to the work values instruments.

ASSESSING NEEDS-SUPPLIES FIT

Two measures of work values were developed to assess needs-supplies fit. The Army
Description Inventory (ADI) was designed to determine the extent to which the Army
environment supports several work-related values. Thus, we refer to this as a “supplies-side”
measure. The Work Values Inventory (WVI), in contrast, assesses the extent to which Soldiers
(and eventually prospective recruits) desire each of these values. We refer to this as a “needs-
side” measure. The development of these measures is described in turn.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
551

To develop the ADI, we first identified a set of values to assess. The initial measure
included 42 values. Of these, 21 values were from the Dawis et al. (1964) taxonomy. The
remaining 21 values were developed based on a review of several source materials, including
previous studies of the values of Army recruits (e.g., Ramsberger, Wetzel, Sipes, & Tiggle,
1999), research on the values of American youth (Sackett & Mavor, 2002), and results of the
Select21 job analysis. We then asked 70 Army non-commissioned officers (NCOs) to indicate
the degree to which the Army provides opportunities for first-tour Soldiers who possess each of
the 42 values. Based on these data, we identified 27 values to assess in the WVI. The decision to
use only 27 of the original 42 values was due to concerns about testing time and redundancy
among the initial set of values. The final 27 values were classified into three categories. The
“high” category included nine values that NCOs indicated that the Army offers first-tour
Soldiers. The “low” category consisted of nine values that NCOs believed the Army does not
offer first-tour Soldiers. The “middle” category included the nine values that fell between the
high and low categories.

Given that most of the 27 work values are socially desirable, having applicants rate them
with a Likert-type scale would probably not produce enough variability in responses (i.e.,
applicants would indicate that all values are important to them). Indeed, applicant response
distortion is a concern whenever non-cognitive measures such as this are used in an operational
selection setting (Rosse, Stecher, Miller, & Levin, 1998). One way to help minimize the effects
of response distortion is to present test items in a forced-choice format (Jackson, Wrobleski, &
Ashton, 2000). We adopted this approach with the WVI. The instrument consists of 81 triads,
each of which includes one value from the three categories described above (i.e., high, low,
middle). Respondents are asked to identify the value that would be most and least important to
them in their ideal job. The WVI is constructed so that no two values are paired together more
than once, and values from the same category are never paired together (e.g., values within the
low category are paired only with high and middle category values). An example item from the
WVI is provided in the Appendix.

Using a forced-choice measure will not decrease response distortion unless items within
each triad are similarly attractive. For example, if one value in a triad sounds more like the Army
than the other two values, applicants may indicate that this value is most like them regardless of
whether they truly value it. We attempted to address this issue in the WVI by comparing values
that the Army provides with values that may appear like something the Army could satisfy but,
in fact, are not characteristic of the Army environment. Nevertheless, even if prospective recruits
are able to correctly identify the values the Army provides (and distort their responses in a way
that is consistent with these values), this type of response pattern might not decrease criterion-
related validity. That is, this form of distortion would indicate that the respondent has realistic
expectations about what the Army is like. Met expectations can, in turn, lead to higher
satisfaction and performance once in the Army (Wanous, 1992).

The forced-choice format of the WVI also gives us several options for scoring the
measure. For example, we could create composite scores for the high and low categories by
summing the number of times respondents choose values in the high category over values in the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
552

low and middle categories to form a high category composite. Likewise, we could sum the
number of times respondents chose values in the low category over values in the high and middle
categories to form a low category composite. Such composites would indicate the degree to
which applicants prefer values that the Army does and does not supply, respectively. A high
score on the high category composite would indicate high needs-supplies fit, whereas a high
score on the low category composite would indicate poor needs-supplies fit. That is, applicants
with high scores on the low category composite would tend to value things the Army does not
generally offer first-tour Soldiers.

Despite the potential advantages of a forced-choice instrument, such measures can also
present challenges when used for selection. For example, forced-choice measures result in ipsative
or partially ipsative data, which can make it difficult to obtain the normative information that is
critical for making between person comparisons (Hicks, 1970). However, the ipsativity of a forced-
choice measure can be reduced by the way it is constructed and scored. For example, assessing
most or all of the constructs within the domain of interest (e.g., all work-related values of
American youth) can increase the degree to which the measure provides normative information.
This is because in a forced-choice instrument, an applicant’s score on a given construct (e.g. the
value of autonomy) depends on the constructs with which it is compared. For instance, autonomy
at work could be most important to an individual when compared to values A and B, but not when
compared to values C and D. Thus, comparing a construct to every other construct within the
domain of interest (rather than comparing it to limited number of constructs) can result in more
accurate approximations of normative trait standings. We attempt to do this in the WVI by
assessing a large number of work values that we think prospective Army recruits could possess.

Another way to reduce the ipsativity of a forced-choice measure is to not score all of the
constructs assessed in the instrument. This approach can minimize ipsativity because it allows
applicants to score high (or low) on all constructs of interest when they are paired only with non-
relevant constructs. In the WVI, values that the Army supports (high category values) are
compared only to values it does not support (low and middle category values), and not other
values supported by the Army. Although this will reduce ipsativity, some will remain because
scores on the supported values depend, in part, on the restricted set of (unsupported) values with
which they are compared.

ASSESSING EXPECTATIONS-REALITY FIT

We also developed measures of expectations-reality fit. These measures are designed to


assess individuals’ knowledge about the work values and interests that the Army actually
supports. We developed these instruments because we believe that needs-supplies fit and
expectations-reality fit may interact to predict attrition and its attitudinal precursors (e.g., job
satisfaction). Based on expectancy theory (Vroom, 1964), we believe that misfit between the
applicant and Army for a given work value or interest depends on (a) how important the
value/interest is to the applicant, (b) how much the applicant expects the Army to provide
opportunities to satisfy the value/interest, and (c) the extent to which the Army actually offers the
value/interest. For example, consider two applicants – one who values autonomy and expects the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
553

Army to provide it, and a second who values autonomy but does not expect the Army to provide
it. If the Army does not provide autonomy, it is likely that the second applicant will be more
satisfied in the Army than the first. That is, although both applicants value autonomy (which in
this case indicates a lack of needs-supplies fit), the fact that the first applicant expects autonomy
and does not receive it is likely to result in greater dissatisfaction.

The Army Beliefs Survey (ABS) was designed to assess expectations-reality fit with
regard to work values. The ABS includes the 27 values measured in the WVI, but uses a Likert-
type rating scale and a different set of instructions (see Appendix). Specifically, rather than
asking prospective recruits to indicate their preference for the values, the ABS assesses their
knowledge about what values the Army does (not) support. Thus, it is not in the best interest of
respondents to indicate that the Army offers all of the values. Because the ABS is essentially a
knowledge test, response distortion is less likely to be an issue and thus we did not develop a
forced-choice version of the measure. As with the WVI, the data we collected from NCOs (via
the ADI) will provide the supplies-side (i.e., “reality”) information against which to compare the
needs-side data from the ABS. The greater the correspondence between the values applicants
think the Army does (not) provides first-tour Soldiers and the values the Army actually provides
(based on NCO ratings), the better the P-E fit should be.

SUMMARY

Recent research suggests that measures of P-E fit can predict valued criteria such as job
satisfaction, organizational commitment, and attrition. However, relatively few P-E fit studies
have been published in the military selection research literature. In this paper, we described a
unique approach to developing fit measures to help select, classify, and retain enlisted Soldiers
for the future Army. Although there are challenges with assessing P-E fit in an operational
context, we believe such measures have the potential to provide substantial utility to the U.S.
Army and in other military settings.

REFERENCES

Barrick, M. R., Mount, M. K., & Gupta, R. (2003). Meta-analysis of the relationship between the
five-factor model of personality and Holland’s occupational types. Personnel Psychology,
56, 45-74.

Cable, D. M., & DeRue, S. D. (2002). The convergent and discriminant validity of subjective fit
perceptions. Journal of Applied Psychology, 87, 875-884.

Dawis, R. V., England, G. W., & Lofquist, L. H. (1964). A theory of work adjustment. Minnesota
Studies in Vocational Rehabilitation, XV. Minneapolis: University of Minnesota.

Edwards, J. R. (1991). Person-job fit: A conceptual integration, literature review and


methodological critique. International review of industrial/organizational psychology
(pp. 283-357). London: Wiley.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
554

Hicks, L. E. (1970). Some properties of ipsative, normative, and forced-choice normative


measures. Psychological Bulletin, 74, 167-184.

Holland, J. L. (1978). Manual for the vocational preferences inventory. Palo Alto, CA:
Consulting Psychologist Press.

Holland, J. L. (1996). Exploring careers with a typology: What we have learned and some new
directions. American Psychologist, 51, 397-406.

Jackson, D. N., Wrobleski, V. R., & Ashton, M. C. (2000). The impact of faking on employment
tests: Does forced-choice offer a solution? Human Performance, 13, 371-388.

Kristof, A. L. (1996) Person-organization fit: An integrative review of its conceptualizations,


measurements, and implications. Personnel Psychology, 49, 1-49.

Ramsberger, P. F., Wetzel, E. S., Sipes, D. E., & Tiggle, R. B. (1999). An assessment of the
values of new recruits (FR-WATSD-99-16). Alexandria, VA: Human Resources
Research Organization.

Rosse, J. G., Stecher, M. D., Miller, J. L., & Levin, R. (1998). The impact of response distortion
on pre-employment personality testing and hiring decisions. Journal of Applied
Psychology, 83, 634-644.

Sackett, P. R., & Mavor, A. (Eds.) (2002). Attitudes, aptitudes, and aspirations of American youth:
Implications for military recruitment. Washington, D.C: National Academies Press.

Saks, A. M., & Ashforth, B. E. (1997). A longitudinal investigation of the relationships between
job information sources, applicant perceptions of fit, and work outcomes. Personnel
Psychology, 50, 395-426.

Verquer, M. L., Beehr, T. A., & Wagner, S. H. (2001, April). A meta-analytic review of relations
between person-organization fit and work attitudes. Paper presented at the 16th Annual
Conference of the Society for Industrial and Organizational Psychology, San Diego, CA.

Vroom, V. (1964). Work and motivation. New York: John Wiley.

Wanous, J. P. (1992). Organizational entry (2nd Ed.). Reading, MA: Addison-Wesley.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
555

APPENDIX

Example Item from the Work Values Inventory (WVI)

Indicate which statement is most important to you in your ideal job, and which statement is least
important to you in your ideal job.

On my ideal job, I would...

A. have opportunities to lead others.


B. try out my own ideas.
C. have a flexible work schedule.

Example Items from the Army Beliefs Survey (ABS)

Few Will Experience Some Will Experience Most Will Experience


This statement describes an This statement describes an This statement describes an
experience few Soldiers will experience some, but not most experience most Soldiers
have during their first Soldiers will have during their will have during their first
enlistment. first enlistment. enlistment.

Using the rating scale above, indicate which category you believe best describes each statement.

Few Some Most Soldiers will…


1. ____ ____ ____ have opportunities to lead others.
2. ____ ____ ____ try out their own ideas.
3. ____ ____ ____ have a flexible work schedule.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
556

Competency Testing for the U.S. Army Noncommissioned Officer (NCO)


Corps

Tonia S. Heffner
U.S. Army Research Institute for the Behavioral and Social Sciences
Alexandria, VA USA
Roy Campbell
Human Resources Research Organization
Radcliff, KY USA
Deirdre J. Knapp
Human Resources Research Organization
Alexandria, VA USA
Peter Greenston
U.S. Army Research Institute for the Behavioral and Social Sciences
Alexandria, VA USA

In 1991, the U.S. Army discontinued its Soldier proficiency-testing program, the Skill
Qualification Test (SQT). The Army’s previous experience with job performance testing spanned
a period of 40 years but was a mixture of successes, frustrations, and adjustments. Although the
SQT program played an important role in the determination of Noncommissioned Officer (NCO)
promotions, the associated costs of preparing and administering over 200,000 tests, in Soldiers’
and administrators’ time as well as financial resources, made the program no longer viable
(Campbell, 1994). However, recent events have brought about an examination of the feasibility
of reinstituting competency testing for the NCO corps. First, in a survey of one-third of the NCO
corps, the Army Training and Learning Development Panel (ATLDP) found NCOs
overwhelmingly want objective testing to demonstrate their accomplishments and provide
feedback on technical, tactical, and leadership skills. Second, the Sergeant Major of the Army
(SMA) has made reinstituting competency testing a priority. Finally, competency testing
reinforces self-development, one of the three pillars of the Army’s NCO educational philosophy.

The U.S. Army Research Institute for Behavioral and Social Sciences (ARI) has
embarked on a three-phase research effort to examine the feasibility of reinstituting competency
testing. This project began and continues as a research effort to develop criterion measures to
validate our selection and classification tests, but the nature of the project has expanded to reflect
the needs identified by the ATLDP. The first phase is a reinforcing dual-track approach. Track
1 is a detailed examination of the issues that influence testing feasibility. Track 2 is the initiation
of a demonstration competency assessment program (DCAP) that mimics the critical aspects of
the development, implementation, and administration processes. The DCAP is designed to
provide an experience-based sense of current issues that will impact feasibility. The second
phase of the research is an evaluation of the DCAP and the initiation of five additional prototype
competency assessments targeted towards specific Army jobs (military occupational specialties
[MOS]). The third phase is an evaluation of the five competency assessments and overall
recommendations. We are currently completing the first phase of this research effort.

FEASIBILITY ISSUES

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
557

To achieve wide-scale implementation, Phase I addresses four general categories of


questions to determine the feasibility of competency testing. First are utilization strategies – how
will the competency assessments be used for personnel management, career development,
training readiness, retention, and recruitment. Second are operational strategies – what are the
boundaries and requirements for an assessment program. Third is implementation – what are the
phases to implementing, maintaining, and growing the assessment program. Finally, are
external considerations – what other programs or initiatives affect or are affected by a
competency program. Although ARI can inform policymakers based on our research, these
questions must ultimately be answered by senior Army leadership.

Many factors contribute to the feasibility of competency assessment for the Army. Our
approach has been to identify these factors through a variety of perspectives. The issues arising
from the historical factor include equity for promotion across the Army, threats to test
compromise, equity of test content within each job type (i.e., MOS), intensive resource demands,
and multiple test uses. Reviews of other military and civilian assessment and certification
programs revealed a variety of different testing approaches including one-time certification (e.g.,
nursing boards, computer certification) as well as multi-level certifications (e.g., National
Institute for Automotive Service Excellence [ASE]). Testing approaches used by sister services
also offer variety in terms of test preparation approaches and resources, test administration
windows, test delivery options, and test development structure and support. Automation
technology (i.e., the internet and computerized test development software) offer new ways for
developing and delivering assessments, but offer new challenges such as computer availability
and increased test security concerns. We are also examining alternative performance assessment
systems such as instituting testing at the completion of required NCO development courses.

In addition to examining the feasibility factors, two other activities were planned as part
of Phase I. These activities - an intensive needs analysis and the DCAP - were clarified by the
SMA, who recommended that competency assessment be:
• used for promotion,
• administered via the internet,
• include the active and reserve components of the Army,
• assess Specialists/Corporals (E4) through Sergeants First Class (E7),
• use multiple choice items and include situation based items, and
• assess the content areas of basic Soldier skills, leadership, conducting training, Army
history, and Army values.

ARMY TESTING PROGRAM ADVISORY TEAM

The Army’s interest in a competency assessment program, specifically our research


effort, and our need for direct experience and guidance prompted the researchers to form an
Army Testing Program Advisory Team (ATPAT). The team consists of 24 senior
Noncommissioned Officers (NCOs) representing 11 commands from the Active Army as well as
commands from the Army Reserves and Army National Guard. This group also acted as our
primary source for needs analysis information and content subject matter experts. Originally
conceived as a one-time council, the ATPAT, largely through its own initiative, has developed
into a crucial feature and resource for the research project. It has provided guidance, assistance,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
558

review, and an up to date reflection of the operational implications of the current and future
Army posture. The ATPAT now meets quarterly and we anticipate that its role and importance
will continue into the follow-on phases.

DCAP DEVELOPMENT

The DCAP was originally designed to be small-scale trial of the assessment development
and delivery systems for one MOS. The purpose was to identify currently available
opportunities and potential current and future obstacles. Based on guidance from the SMA and
the ATPAT, the DCAP has been reconceptualized to be a prototype of an Army-wide test geared
to Soldiers eligible to be promoted to the NCO corps. The DCAP development is a multi-step
process.

First, we identified the test topics based on job analysis and objectives identified by the
SMA and the ATPAT. Although the SMA identified general test topics, this stage presented
some challenges. The first question we had to address was whether the assessment should solely
address knowledge that Soldiers are expected to have acquired based on training and doctrine or
if it should also include potential to perform at the next higher grade. After extensive discussion
by the ATPAT members, it was decided that the majority (51%) of the prototype assessment
would cover basic Soldiering skills, Army history, and Army values that E4 Specialists and
Corporals are required by training and doctrine to know. The remainder of the prototype
assessment would cover Soldiering skills, leadership, and training required at the next grade
level, but be limited to the most essential tasks. This was a particularly important decision for
the prototype assessment, because all doctrinal training for leadership and training skills does not
begin until the Soldier is promoted to Sergeant. We also had to decide the breadth of materials
to be covered in the assessment. The ATPAT members decided to limit the resources to the key
publications in each topic area including the Soldier’s Manual of Common Tasks (STP-21-1 and
STP 21-24 SMCT), the Soldier’s Guide (FM 7-21.13), the Noncommissioned Officer’s Guide
(FM 7-22.7), Army Leadership Be, Know, Do (FM 22-100), Training the Force (FM 7.0), and
Battle Focused Training (FM 7-1).

In the second step, we developed the test blueprint. We began by generating a list of the
possible topic tasks and skills to be included based on the field manuals. We presented this list
to the ATPAT members who, through multiple iterations, determined the relative importance of
the topic tasks, skills, and knowledge areas. They also assigned percentages to each area; these
percentages reflect the number of items to be written for each topic.

Next, we identified the format and test content specific to the selected prototype. The
format was provided by the SMA’s guidance for a multiple-choice test. Although there was
discussion of using hands-on assessment, either using specifically developed tests or capitalizing
on the currently used Common Task Test, these were deemed impractical because of financial or
system constraints at this time.

To address the SMA’s requirement for situation-based items, we are including a


situational judgment test (SJT). The 24-item SJT was developed to assess the knowledge, skills,
and attributes of directing, monitoring, and supervising individual subordinates; training others;

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
559

team leadership; concern for Soldiers’ quality of life; cultural tolerance; motivating, leading, and
supporting individual subordinates; relating to and supporting peers; and problem
solving/decision making. The items consist of a brief problem scenario and four possible
responses. Soldiers are asked to identify the most and least effective responses to the scenario.
The SJT was validated against job performance for junior NCOs (Knapp et al., 2002; Knapp,
McCloy, & Heffner, 2003).

Fourth, we are developing the test prototype. The items are being prepared using a
software program compatible with internet distribution. This allows for graphics to be used in a
multiple-choice format. The prototype will be reviewed by groups of NCOs who develop or
teach primary leader development courses and basic Soldiering skills for NCOs. They will also
be reviewed by the ATPAT members before completion at the end of the calendar year.

Finally, we will prepare for self-assessment. The ATPAT recommended that the
competency assessment not stand alone, but be supported by a self-assessment as well. This
self-assessment is not intended to be a pre-test, but is to be designed to provide opportunity for
learning and development. Although this portion of the project is still in the planning stages, we
expect to have assessment items similar to the DCAP, but each item will provide feedback on the
correct response, the logic behind that response, and resources for learning more about the topic
addressed by the item.

DCAP ADMINISTRATION

Phase II of the project will unfold over the next 15 months starting in January 2004. The
SMA has required the DCAP administration be internet-based. The Army has an extensive
network of distributed learning facilities that we are planning on incorporating into our Phase II
activities. Using the existing Army Training Support Center’s secure network, a portal will be
established to allow Soldiers to register with the system and take the assessment. The
assessment administration will be proctored to ensure test security. Our goal is administer the
DCAP to about 600 to 1000 Soldiers world wide, including a sizable sample of Reserve
Component and Army National Guard Soldiers.

LESSONS LEARNED

• The ATPAT has been a highly successful venture and contributes significantly to the
progress and success of the research effort. Concerted efforts should be made to include
an advisory panel of varied personnel to advise and guide continued research efforts.
• Early and consistent involvement of the Reserve Components in decisions is deemed
essential to the success of the program because almost 60% of the Army force structure
are in the Reserve Components.
• Administrative and policy issues, ranging from study materials to test reporting, are as
imperative to the program (if not more so), as test development issues.
• There are many critical and central issues yet to be addressed. Examples include the
designation of Army infrastructure and organizational testing entities, long-term needs,
and the means for sustaining and maintaining a viable test development and
administration program.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
560

• Enthusiasm and support for a competency testing program remains high within the Army
and within the NCO corps.

REFERENCES

Campbell, R.C. (1994) The Army skill qualification test (SQT) program: A synopsis (Interim
Report IR-PRD-94-05) Alexandria, VA: Human Resources Research Organization.

Department of the Army (2003). Army Leadership Be, Know, Do (Field Manual 22-100).
Washington, DC: Author.

Department of the Army (2003). Battle Focused Training (Field Manual 7-1). Washington, DC:
Author.

Department of the Army (2003). Noncommissioned Officer’s Guide (Field Manual FM 7-22.7).
Washington, DC: Author.

Department of the Army (2003). The Soldier’s Guide (Draft) (Field Manual 7-21.13). FT Bliss,
TX: Author.

Department of the Army (2003). Soldier’s Manual of Common Task (Soldier Training
Publication 21-1). Washington, DC: Author.

Department of the Army (2003). Soldier’s Manual of Common Task (Soldier Training
Publication 21-24 SMCT). Washington, DC: Author.

Department of the Army (2003). Training the Force (Field Manual 7.0). Washington, DC:
Author.

Knapp, D. J., Burnfield, J. L., Sager, C. E., Waugh, G. W., Campbell, J. P., Reeve, C. L.,
Campbell, R. C., White, L. A., & Heffner, T. S. (2002). Development of Predictor and
Criterion Measures for the NCO21 Research Program (Technical Report 1128).
Alexandria, VA: U. S. Army Research Institute for the Behavioral and Social Sciences.

Knapp, D. J., McCloy, R., & Heffner. T. S. (2003). Validation of Measures Designed to
Maximize 21st-Century Army NCO Performance (Contractor Report). Alexandria, VA:
Human Resources Research Organization.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
561

1ST WATCH: ASSESSMENT OF COPING STRATEGIES EMPLOYED BY NEW


SAILORS

Marian E. Lane, M.S.


Jacqueline A. Mottern, Ph.D.
Michael A. White, Ph.D.
Marta E. Brown, M.S.
Erica M. Boyce

U.S. Navy Personnel Research, Studies, and Technology, PERS-13


Millington, TN 38055-1300
marian.lane@persnet.navy.mil

This paper examines the relationship between coping strategies and the success of new
Sailors during the first term of enlistment. Surveys were administered to Navy recruits (N =
47,708) upon entry into Recruit Training Command, graduation from Recruit Training
Command, graduation from “A”/Apprentice School, and/or exit from the Navy. These surveys
included demographic items; a 32-item stress coping scale adapted from the Ways of Coping
Checklist (WCCL; Vitaliano, Russo, Carr, Maiuro, & Becker, 1985) designed to assess coping
strategies employed by people when faced with stressful situations; and a 24-item reasons-for-
joining scale designed to assess the influence of various factors on the decision to join the Navy.
Data were analyzed to determine the reliability and factor structure of the stress coping scale, as
well as to examine possible relationships between type of coping strategy utilized and other
factors, including outcomes and demographic characteristics. Results indicate that recruits who
successfully completed training were more likely to utilize different types of coping strategies
than recruits who exited the Navy prior to the completion of training. Coping strategies were
significantly related to type of education credential and weakly related to reasons for joining the
Navy. Results also indicate gender differences in the frequency of use of various types of coping
strategies, but these differences were not related to attrition rates within gender. The results
suggest that an assessment of coping strategies may be useful in recruiting and selection
purposes, as well as in preparation for and during military training.

BACKGROUND

Overall, the 1st Watch project is based on the idea that increasing the Navy’s knowledge
base regarding what factors contribute to Sailors’ success during the first term of enlistment, as
well as subsequent terms for those who chose to make the Navy a permanent career decision,
will assist in retaining qualified Sailors for the Navy. The 1st Watch project began an
investigation into factors related to outcomes such as performance, satisfaction, morale, and
stress at various points during this first term. This paper focuses on the latter of these outcomes –
stress – and the ways new recruits handle the potentially stressful situations they inevitably
encounter.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
562

Coping Strategies
Lazarus and Folkman (1984) define coping as “constantly changing cognitive and
behavioral efforts to manage specific external and/or internal demands that are appraised as
taxing or exceeding the resources of the person” (p. 141). It is these specific demands that often
produce anxiety about a given situation and build into stress that may either enhance or diminish
performance in the situation, if not dealt with in an effective and timely manner through any of
the various stress coping strategies. For instance, Edwards and Trimble (1992) examined the
relationship between anxiety and performance, as influenced by coping strategy employed in the
situation, and found that task-oriented coping responses (such as problem-focus) were positively
related to performance on a task, while emotion-oriented coping responses (such as avoidance
and distancing) were negatively related to performance on the task. Because coping has been
related to such outcomes, an examination of this factor should be of great interest to the military
as it competes with the civilian workplace in an attempt to attract, recruit, hire, and retain the
most qualified individuals.
The beginning of a Navy career may be, and most likely is, a very stressful time for most
recruits. Many of them are leaving home for the first time to travel far from family and friends to
a new environment in which they know no one and have limited knowledge as to what is about
to happen in their lives. An assessment of the ways in which these recruits handle the stressful
situations they face in the beginning of their Navy careers may shed some light on how likely
these recruits are to make the Navy a life-long career. The methods used in this process, the
coping strategies for handling potentially stressful, anxiety-producing situations, may vary both
across and within individuals and/or groups of individuals, and knowledge of these differences
may provide valuable information regarding the choices that these recruits make concerning their
military careers.
As the demands of recruit training at Recruit Training Command (RTC) introduce novel
situations and tasks for new recruits to face each day, they must either figure out ways to adapt to
and deal with these situations or be faced with the prospect of ending their careers early and
returning to their civilian lives. It is likely that, more often than not, the latter of these choices is
not beneficial for either the Navy or the exiting recruits. Therefore, a priori information about
how recruits cope with the challenges they face may be helpful in designing and implementing
ways to prevent unwanted attrition from training due to situational stress and anxiety.

METHOD

Sample
The sample for the current study was composed of new recruits (N = 47,708) who had
recently joined the Navy and were embarking upon initial recruit training at the Great Lakes
Naval Training Center, Great Lakes, IL, from the beginning of data collection in April 2002 to
August 2003.

Survey
New Sailor Survey. The first of four questionnaires administered during the course of the
first term of enlistment is the New Sailor Survey, which is composed of questions designed to

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
563

assess individuals’ personal values, their experiences with recruiting and classifying, their
reasons for joining the Navy, their expectation of training, their stress coping skills, their fit with
the Navy, and their demographic information.

Measure
Stress Coping Scale. In addition to finding themselves in a completely novel situation,
new recruits are faced with additional potentially stressful daily events within this unfamiliar
context, such as daily physical fitness exercises and rigorous inspections conducted by their
Recruit Division Commanders (RDCs). These events likely add to the stress already building
within these recruits. As mentioned previously, a portion of the New Sailor Survey is devoted to
the assessment of stress coping skills and techniques employed by the recruits. The scale consists
of 32 items adapted from the revised Ways of Coping Checklist (WCCL) developed by
Vitaliano, Russo, Carr, Maiuro, and Becker (1985). It was hypothesized that the use of certain
types of coping skills would be more closely related to successful completion of recruit training
than the use of different types of coping skills, and that identification of the skills most closely
related to training success could be used in future recruit training efforts.

Procedure
Data for this study was collected from recruits as they traveled by bus from O’Hare
airport in Chicago to RTC in Great Lakes. These recruits were given the New Sailor Survey,
which requires approximately 45 minutes to complete (the duration of the trip from O’Hare to
RTC). Upon completion and arrival at Great Lakes, the questionnaires were collected by a Navy
Petty Officer, and, periodically, shipped to our data processing center, where they were
electronically scanned in preparation for data analysis.

RESULTS

Factor structure
Data were analyzed to determine the factor structure of the stress coping strategies
measure. The revised measure administered by Vitaliano et al. (1985) consisted of 42 items and
factored into five distinct subscales, corresponding to five different styles for coping with
stressful situations: Problem-focused, Blamed self, Wishful thinking, Seeks social support, and
Avoidance. At pre-test, this scale factored into the same five factors as found by Vitaliano et al.,
but 10 of the 42 items were eliminated due to low factor loadings scattered across the five
primary factors. The remaining 32 items were included on the New Sailor Survey as the Sailor
Stress Coping Scale.
The principal components analysis resulted in five factors with λ greater than 1, which
corresponded to the five factors indicated by Vitaliano et al. Factor loadings ranged from .24 to
.86. Two items from the Seeks social support subscale and one item from the Avoidance subscale
had slightly higher loadings on other factors, but had greater theoretical and practical
significance as interpreted on the original scales. Therefore, the subscales indicated by Vitaliano
et al. and observed in the pre-test phase of the current project were retained for subsequent
analyses.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
564

Reliability analysis
Internal consistency reliability analyses indicated moderate to high reliabilities for all five
subscales (Problem-focused, α = .88; Blamed self, α = .82; Wishful thinking, α = .86; Seeks
social support, α = .62; and Avoidance, α = .83).

Graduates vs. attrites


A disposition report including the outcome (graduate or attrite prior to graduation) of the
recruits included in the sample was obtained from RTC at Great Lakes. Results of a multivariate
analysis of variance indicated that there was a significant difference between these groups on the
set of variables [F(5, 23164) = 16.947, Wilk’s λ = .996, p < .001], which was the set of coping
strategies assessed by the Ways of Coping Checklist modified for this study. Further examination
revealed differences between these groups on the Avoidance and Wishful thinking subscales,
such that recruits who attrited prior to graduation were significantly more likely to use these
types of coping strategies than those successfully completed recruit training.

Demographic characteristics
Gender. Results of a multivariate analysis of variance indicated that there was a
significant difference between males and females on the set of variables comprising the stress
coping scales used in this study [F(5, 40574) = 192.329, Wilk’s λ = .977, p < .001. Further
examination revealed differences between males and females on all coping subscales except
Problem-focused, such that males were significantly more likely to use coping strategies of
Avoidance, Wishful thinking, and Blamed self than females, and females were significantly more
likely to use the coping strategy of Seeks social support than males. However, these differences
did not appear to be related to attrition rates between male and female recruits.
Reasons for joining. Few strong relationships were observed between reason for joining
the Navy and type of coping strategy employed by these recruits. The strongest relationships
were between the Problem-focused strategy subscale and such reasons for joining as ‘wanted to
test myself in a demanding situation’, ‘challenging or interesting work’, ‘desire to serve my
country’, and ‘personal growth’. Weaker relationships were observed between the Avoidance and
Wishful thinking strategy subscales and reasons such as ‘get away from family or personal
situations’, ‘get away from hometown’, and ‘time to figure out what I want to do’. The Seeks
social support strategy subscale was related, though not as strongly, to the same reasons for
joining as the Problem-focused strategy, as well as reasons relating to benefits and skills to be
acquired in specific occupations.
Type of education credential. Results of a multivariate analysis of variance indicated that
there was a significant difference among types of education credential earned on this set of
variables [F(45, 177264) = 7.292, Wilk’s λ = .992, p < .001]. Further examination revealed
differences on all coping subscales. All groups reported being most likely to use the Problem-
focused strategy and least likely to use the Avoidance strategy. Within each type of strategy,
though, different groups (by type of education credential earned) reported being more and less
likely to use the strategy. Within the Problem-focused strategy, those earning a diploma from an
adult school reported being most likely to use this strategy, while those receiving a diploma
issued by parents or tutors for home schooling reported being least likely to use this strategy.
Within the Avoidance strategy, those receiving a diploma from a vocational or technical school

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
565

reported being most likely to use this strategy, while those receiving a diploma issued by parents
or tutors for home schooling reported being least likely to use this strategy. Within the Wishful
thinking strategy, those receiving a diploma from an correspondence school reported being most
likely to use this strategy, while those receiving a diploma issued by parents or tutors for home
schooling reported being least likely to use this strategy. Within the Blamed self strategy, those
earning college credit to turn a GED into a diploma reported being most likely to use this
strategy, while those receiving a diploma issued by an association or other school for home
schooling reported being least likely to use this strategy. Within the Seeks social support
strategy, those earning a diploma from a correspondence school reported being most likely to use
this strategy, while those receiving a diploma issued by parents or tutors for home schooling
reported being least likely to use this strategy.

DISCUSSION

The 1st Watch project already has provided much information regarding new recruits as
they enter their first term of enlistment. The exploration of measures unincorporated in previous
research has contributed significantly to the existing body of knowledge regarding the factors
that influence success at the beginning of this critical first term. Examination of the strategies
employed by these recruits for coping with stressful situations reveals possible applications to
future recruit training to enhance the chances of success during this critical first term.
The results of the current study reveal that the measure of coping and its factor structure
are reproducible in this context from original work, and reliability analyses indicate that the
items composing the subscales hold together adequately to assume the measurement of the
intended constructs. The difference found between graduates and those who attrite prior to
graduation on two of the coping strategy subscales, Avoidance and Wishful thinking, makes some
intuitive sense. Vitaliano et al. found these strategies to be related to negative outcomes for the
individual, including depression and anxiety; therefore, these strategies may be perceived as
being more ‘negative’ or ‘unproductive’ approaches to coping with stressful situations as
compared to other, more positive, productive strategies, such as Problem-focused and Seeks
social support. The nonsignificant findings between these groups on the other strategy subscales
may suggest that other factors related to coping strategies, and not merely the likelihood of using
a specific type of strategy across situations, affect the probability of success of new recruits.
Future research may explore possibilities for these related factors, such as coping adaptability, in
addition to coping strategies themselves.
Results also indicate that although males and females report differences in the frequency
of use of some types of coping strategies, such that males are more likely than females to use the
more ‘negative’ coping strategies of Avoidance and Wishful thinking, these differences did not
appear to be related to attrition rates in either group, and differences did not emerge on any other
strategies. Significant but weak relationships were observed between types of coping strategy
employed and reasons for joining the Navy, indicating that more positive, productive coping
strategies such as Problem-focused and Seeks social support are related to more constructive
reasons for joining such as ‘wanted to test myself in a demanding situation’, ‘challenging or
interesting work’, ‘desire to serve my country’, and ‘personal growth’; more ‘unproductive’
strategies such as Avoidance and Wishful thinking strategy subscales are related to more escapist

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
566

reasons such as ‘get away from family or personal situations’, ‘get away from hometown’, and
‘time to figure out what I want to do’. Although significant, these relationships were relatively
weak, and further examination of the relationship between the use of coping strategies and
reasons for joining the Navy, and any potentially related factors, should be conducted prior to
basing recruiting and/or selection decisions on these results.
Recruits who earned different types of education credentials differed on types of coping
strategies used more often, although all groups reported using the Problem-focused strategy most
often and the Avoidance strategy least often. The group of recruits who received a diploma from
parents or tutors for home schooling appeared least likely among the groups to use any one type
of coping strategy most often, which may indicate that these individuals are more likely to use a
variety of coping strategies at different times, depending on the situation at hand.
The results of the current study shed new light on the construct of coping strategies and
their relationship to the success of new recruits during initial recruit training and throughout the
first term of enlistment. These results lend support to the hypothesis that the employment of
different coping strategies may be important to this success and indicate that there may be
additional factors related to the use of coping strategies that will further contribute to knowledge
of the most salient issues related to the success of new recruits in the initial stages of training, as
well as throughout the first term of enlistment.

References

Edwards, J.M., & Trimble, K. (1992). Anxiety, coping and academic performance. Anxiety,
Stress, and Coping, 5, 337-350.

Lazarus, R.S., & Folkman, S. (1984). Stress, appraisal, and coping. New York: Springer.

Mottern, J.A., White, M.A., & Alderton, D.L. (2002). 1st watch on the first term of enlistment.
Paper presented at the 44th Annual Conference of the International Military Testing
Association, Ottawa, Canada.

Vitaliano, P.P., Russo, J., Carr, J.E., Maiuro, R.D., & Becker, J. (1985). The ways of coping
checklist: Revision and psychometric properties. Multivariate Behavioral Research, 20,
3-26.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
567

1st WATCH: THE NAVY FIT SCALE


Marta E. Brown, M.S.
Jacqueline A. Mottern, Ph.D.
Michael A. White, Ph.D.
Marian E. Lane, M.S.
Erica M. Boyce

U. S. Navy Personnel Research, Studies, and Technology, PERS-13


Millington, TN 38055-1300
marta.brown@navy.mil

The purpose of this study was to extend existing research by determining the relationship
between person-organization (P-O) fit and attitudinal and behavioral outcomes in the Navy
setting. This study utilized data from a longitudinal study designed to examine Navy recruits (N
= 47,708) during their first term of enlistment. Surveys were administered upon graduation from
Recruit Training Command, graduation from “A”/Apprentice School, and/or exit from the Navy.
The surveys included demographic characteristics; the Navy Fit Scale, an 8-item measure of
perceived P-O fit designed for use with Navy personnel; the Navy Commitment scale, a 14-item
scale of commitment to the Navy; and additional outcome variables including morale, Navy
expectations, level of stress, and attrition. P-O fit was found to have a moderate positive
relationship with morale and organizational commitment. In addition, P-O fit had a weak
positive relationship with Navy expectations and a weak negative relationship with stress and
attrition. The impact of setting on P-O fit and the implications for generalization are discussed.
The results support the importance of understanding the association between perceived fit with
the Navy and career outcomes.

BACKGROUND

This study utilized data from the “1st Watch on the First Term of Enlistment” study,
which was designed to examine Navy recruits during their first term of enlistment (Mottern,
White, & Alderton, 2002). The main goals of the 1st Watch project were to gather information
pertaining to the career progress of Sailor’s during the first term of enlistment and to use that
information to help develop highly qualified and well prepared sailors in the future. To
accomplish these goals the model developed for the project employed person-organization (P-O)
fit theory modified for use in the navy setting.
P-O fit is commonly defined as “the compatibility between people and organizations that
occurs when: (a) at least one entity provides what the other needs, or (b) they share similar
fundamental characteristics, or (c) both” (Kristof, 1996). In current literature, P-O fit is
frequently operationalized as the congruence between individual and organizational values, and
has focused on workers in civilian occupations. Chatman (1991) argued for value congruence as
a measure of person-organization fit because values are fundamental and relatively enduring and
individual and organizational values can be directly compared. An indirect method to assess
objective fit was developed for use in this study, which involves comparing individual ratings of

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
568

personal characteristics with a profile of Navy values. This research extends the P-O fit literature
by examining the controlled environment of military training and military life.
Individuals develop a sense of fit during their career within an organization, which
impacts their attitudes, decisions, and behavior. Cable & DeRue (2002) explain that when P-O fit
exits, there is a match between the employees’ values and the organization’s values. This fosters
a sense of involvement and creates a strong bond, which results in greater identification with the
organization, a positive perception of organizational support, and the decision to stay in the
organization. This is congruent with Gade (2003), who described commitment in terms of
service members as “a person who is strongly attached to his or her military service as an
organization and to his or her unit as a part of that organization.” Although several studies have
found that P-O fit is related to job satisfaction (e.g., Kristof-Brown, Jansen, & Colbert, 2002) and
organizational commitment (e.g., O'Reilly, Chatman, & Caldwell, 1991), there is less evidence
for the relationship between P-O fit and morale and accuracy of expectations. Thus, in this study
the following hypothesis is proposed:

Hypothesis 1: Person-organization fit is positively related to morale, organizational


commitment, and expectations.

When people do not fit with their environment, they experience more negative affect such
as feelings of incompetence and anxiety (Chatman, 1991). Past research indicates that P-O fit is
related perceived stress (e.g., Lovelace & Rosen, 1996), and attrition (e.g., Saks & Ashforth,
1997). The following hypotheses are based on these findings and intend to extend the research by
improving generalizability:

Hypothesis 2: Person-organization fit is negatively related to stress and attrition.

Hypothesis 3: Recruits who graduate will have higher person-organization fit than those
who attrite.

METHOD

Upon entering the Navy, recruits complete 8 weeks of training with a training division at
Recruit Training Command (RTC), Great Lakes Naval Training Center. Once the training
requirements are completed, trainees graduate from RTC and transition to the next phase of
training. The graduates either continue with advanced training at an “A/Advanced School” or
attend a 2-3 week Apprentice School. Questionnaires were administered to trainees at these two
milestones (RTC Grad Survey and “A”/Apprentice School Survey) and, in the case of attrition,
upon separation from the Navy (Exit Survey). Data collection began in April 2002 and
concluded in August 2003.

Sample
Navy recruits in training at the Great Lakes Naval Training Center were tracked from the
beginning of Recruit Training Command to graduation from “A”/Apprentice School (N =
47,708).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
569

Surveys
RTC Grad Survey. This questionnaire was administered by Petty Officers to trainees (N =
31,331) who are identified for graduation from Recruit Training Command. The questionnaire
included the Navy Fit Scale, the Navy Commitment Scale, morale, Navy expectations, and level
of stress. Eighty three percent of respondents were men.
“A”/Apprentice School Survey. This questionnaire was distributed by a student class
leader to trainees who were identified for graduation from “A”/Apprentice School (N = 9,323).
The Navy Fit Scale, the Navy Commitment Scale, morale, Navy expectations, and level of stress
were included in the questionnaire. Seventy nine percent of respondents were men.
Exit Survey. This questionnaire was administered at the separation barracks to all trainees
who attrited during training (N = 2,592). The questionnaire contained the Navy Fit Scale, morale,
Navy expectations, and level of stress. Eighty three percent of respondents were men.

Measures
Navy Fit Scale. An 8-item measure of perceived person-organization fit designed for use
in the Navy setting. Items were developed from the Navy’s Evaluation Report & Counseling
Record (E1-E6) to represent each of the domains on which Sailors are rated annually. These
domains are personal characteristics that together form a profile of Navy values. Respondents
were instructed to respond as to how their Recruit Division Commander would rate them
compared to other sailors on the set of personnel characteristics, using a 5-point Likert scale
(“Far better than the average recruit” to “Far worse than the average recruit”). This indirect
method assesses objective fit by comparing individual ratings with the Navy profile. The
reliability of the scale was α = .88 (RTC), α = .90 (“A”/Apprentice School), and α = .93 (Exit).
Navy Commitment Scale. Organization commitment was measured using adapted items
from Meyer & Allen’s (1987) scale and additional items specific to the Navy. The reliability of
the 14-item scale was α = .85 (RTC) and α = .88 (“A”/Apprentice School).
Outcome variables. Other variables including morale, Navy expectations, and level of
stress during recent training period were measured by specific questions on the surveys.
Participants rated these items on Likert-type scales. Attrition was collected from administrative
records.
Demographic variables. Demographic characteristics including gender, current paygrade,
and highest education level achieved were also collected.

RESULTS

Tables 1, 2, and 3 present the means, standard deviations, and correlations for each of the
study variables by respective survey. Hypothesis 1 predicted that person-organization fit would
be positively related to morale, organizational commitment, and accuracy of expectations. The
moderate positive correlations between P-O fit and commitment on all three questionnaires
indicates that a high degree of similarity between individual characteristics and the Navy’s
desired personal characteristics is positively related to a strong attachment to the military (rrtc =
.28 and ras = .35). Also, the moderate positive correlations between P-O fit and morale points to
the relationship between a high level of P-O fit and positive psychological well-being (rrtc = .29,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
570

ras = .33, rexit = .43). Finally, the hypothesized relationship between P-O fit and accuracy of
expectations was supported by weak to moderate positive correlations (rrtc = .13, ras = .21, rexit =
.33). This interesting relationship between reporting that expectations for training were correct
and having a high degree of fit with the Navy warrants further investigation.
Hypothesis 2 predicted that Person-organization fit is negatively related to stress and
attrition. The small to moderate negative correlations between P-O fit and perceived stress on all
three questionnaires supports this hypothesis (rrtc = -.15, ras = -.15, rexit = -.38). However, the
strength of this relationship may have been diminished in RTC and “A”/Apprentice School by
the general stress experienced by the majority of trainees that is a standard component of the
training design. P-O fit also had a weak negative relationship with attrition (r = -.19), where
turnover decision was coded as 1 = graduate (RTC, “A”/Apprentice School, or both) and 2 =
exit. This correlation may not be an accurate representation of the relationship due to the variety
of reasons for which trainees attrite from the Navy (e.g. existing medical problems or injury that
occurred during training).

Table 1
Means, Standard Deviations, and Correlations for RTC Grad Variables
Variable M SD 1 2 3 4 5
1. Person-organization fit 31.70 4.71 ⎯
2. Commitment 56.23 8.03 .28 ⎯
3. Morale 3.65 0.84 .29 .35 ⎯
4. Navy expectations 3.43 1.03 .13 .38 .30 ⎯
5. Perceived stress 3.11 1.03 -.15 -.16 -.16 -.20 ⎯
Note. All correlations are significant at p < .001.

Table 2
Means, Standard Deviations, and Correlations for “A”/Apprentice School Variables
Variable M SD 1 2 3 4 5
1. Person-organization fit 31.62 5.13 ⎯
2. Commitment 48.76 8.60 .35 ⎯
3. Morale 3.64 0.89 .33 .40 ⎯
4. Navy expectations 3.38 1.05 .21 .47 .40 ⎯
5. Perceived stress 2.96 1.09 -.15 -.18 -.20 -.24 ⎯
Note. All correlations are significant at p < .001.

Table 3
Means, Standard Deviations, and Correlations for Exit Variables
Variable M SD 1 2 3 4
1. Person-organization fit 27.92 6.80 ⎯
2. Morale 3.12 1.13 .43 ⎯
3. Navy expectations 2.74 1.22 .33 .44 ⎯
4. Perceived stress 3.77 1.15 -.38 -.37 -.40 ⎯
Note. All correlations are significant at p < .001.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
571

Hypothesis 3 stated that recruits who graduate will have higher P-O fit than those who
attrite. To test this hypothesis, an independent samples t-test was conducted comparing graduates
of either RTC or “A”/Apprentice School to those who attrite. If an individual completed both
RTC and “A”/Apprentice School, the P-O fit score was an average of the two scores. The results
of this analysis indicate that graduates have significantly higher P-O fit than those who attrite, t
(35889) = 35.88, p < .001.
To further investigate the construct of P-O, variation in P-O fit by demographic
information was analyzed. An analysis of gender differences on P-O fit scores indicated that
males had significantly higher P-O fit scores than females for RTC graduates [t (28778) = 19.44,
p < .001); “A”/Apprentice School graduates [t (8618) = 9.56, p < .001)]; and attrites [t (2294) =
3.48, p < .01)]. Three one-way analyses of variance (ANOVA) with Bonferroni post hoc tests
were used to investigate P-O fit at current levels of paygrade (E1, E2, and E3). The results
indicated that those trainees at higher levels had significantly higher P-O fit scores for RTC
graduates [F (2,28759) = 329.07, p < .001]; “A”/Apprentice School graduates [F (2, 8112) =
45.03, p < .001], and attrites [F (2, 2288) = 9.90, p < .001].
Three additional ANOVAs with Bonferroni post hoc tests were used to investigate P-O
fit and highest level of education achieved (10th grade or less, 11th, 12th, one or more years of
college or technical school, and Bachelor’s degree). The results indicated that for RTC graduates,
those trainees with one or more years of college or technical school and those with a Bachelor’s
degree had significantly higher P-O fit than all lower levels of education, F (4, 20130) = 53.69, p
< .001. For “A”/Apprentice School graduates, those trainees with one or more years of college or
technical school and those with a Bachelor’s degree had significantly higher P-O fit than those
who had completed 10th grade and those who had completed 12th grade, F (4, 5100) = 17.46, p <
.001. Finally, there were no significant differences between levels of education on P-O fit for
attrites.

DISCUSSION

This study investigated the relationship between the similarity of individuals’ self-
perceptions of personal characteristics and the navy’s desired personal characteristics (P-O Fit)
and attitudinal and behavioral outcomes in the Navy setting. The results suggest that a high
degree of similarity between individual ratings of personal characteristics and a profile of Navy
values was related to a strong attachment to the military, positive psychological well-being,
correct training expectations, lower perceived stress, and retention. Research aimed at better
understanding this relationship between perceived fit with the Navy and career outcomes can be
applied in the development of highly qualified and well prepared sailors. The differences found
pertaining to demographic characteristics highlight specific segments of the population that
could benefit from further investigation.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
572

REFERENCES

Cable, D. M., & DeRue, D. S. (2002). The convergent and discriminant validity of subjective fit
perceptions. Journal of Applied Psychology, 87(5), 875-884.
Chatman, J. (1991). Matching people and organizations: Selection and socialization in public
accounting firms. Administrative Science Quarterly, 36, 459-484.
Gade, P. A. (2003). Organizational Commitment in the Military: An Overview. Military
Psychology, 15(3), 163-166.
Kristof, A. L. (1996). Person-Organization Fit: An Integrative Review of its Conceptualizations,
Measurement, and Implications. Personnel Psychology, 49(1), 1-50.
Kristof-Brown, A. L., Jansen, K. J., & Colbert, A. E. (2002). A policy-capturing study of the
simultaneous effects of fit with jobs, groups, and organizations. Journal of Applied
Psychology, 87(5), 985-993.
Lovelace, K. & Rosen, B. (1996). Differences in Achieving Person-Organization Fit among
Diverse Groups of Managers. Journal of Management, 22(5), 703-723.
Meyer, J. P., & Allen, N. J. (1987). Organizational commitment: Toward a three-component
model, Research Bulletin No. 660. The University of Western Ontario, Department of
Psychology, London.
Mottern, J. A., White, M. A., & Alderton, D. L. (2002, October). Ist Watch on the First Term of
Enlistment. Paper presented at the International Military Testing Association 44th Annual
Conference, Ottawa, Ontario.
O'Reilly III, C. A., Chatman, J., & Caldwell, D. F. (1991). People and Organizational Culture: A
Profile Comparison Approach to Assessing Person-Organization Fit. Academy of
Management Journal, 34(3), 487.
Saks, A. M., & Ashforth, B. E. (1997). A longitudinal investigation of the relationships between
job information sources, applicant perceptions of fit, and work outcomes. Personnel
Psychology, 50(2), 395-426.
Bretz, Jr., R. D., & Judge, T. A. (1994). Person-Organization Fit and the Theory of Work
Adjustment: Implications for Satisfaction, Tenure, and Career Success. Journal of
Vocational Behavior, 44(1), 32-54.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
573

USING RESULTS FROM ATTITUDE AND OPINION SURVEYS1


Dr. Alma G. Steinberg and Dr. Susann Nourizadeh
U.S. Army Research Institute for the Behavioral and Social Sciences
5001 Eisenhower Avenue
Alexandria, VA 22333-5600
steinberg@ari.army.mil

This paper addresses the utilization of attitude and opinion surveys from three
perspectives: respondents to the survey, sponsor/proponents for the survey, and the researchers
who conduct the survey. It shows how understanding the perspectives of each can maximize
potential use of survey results.

Background

Survey results can be used in many different ways. Some examples are:

(a) Provision of scientifically-sound, and timely information to decision-makers


(b) Monitoring of Soldier issues
(c) Conducting program or policy assessments
(d) Determination of the validity of anecdotal information or opinions
(e) Tracking of trends on a wide number and variety of issues
(f) Identification of emerging issues
(g) Assessments of the impact of unexpected events (by comparing to baselines)

In spite of the many ways survey results can be used, it is common for people to be
skeptical about their use. The discussion below indicates some of the reasons for this.

Discussion

Respondents typically want to know the results of the surveys they take. Yet, the results
are usually not provided to them personally, often for very practical reasons. Results may be
available in public forums (e.g., reports, newspaper articles, the Web) at some later date, but
respondents may not be aware of them or may not associate them with the surveys they took.
Respondents also want to know how the results will be and have been used. Yet, when the
results are used, the user does not always specify the source. Further, the results are often used
in conjunction with other sources of input and thus they are not easily recognizable. As a result,
respondents often make false assumptions. The first assumption is that that most people
responded just the way they did. The second is that, if they cannot identify an impact from the
survey, the results have not been used.

The sponsors or proponents of the survey also want to know the results. In addition, they
typically want to know the context for interpreting the results, such as: (a) comparisons with
other populations/subpopulations of interest, (b) trends indicating changes over time, and (c) the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
574

reasons behind the responses. Finally, sponsors often want concrete recommendations to address
the issues highlighted by the results.

Sponsors often mistakenly assume that since the survey was conducted for them, there is
no compelling need to address respondent concerns about the results. Thus, they may severely
limit distribution of all or part of the results. Also, they may not be aware of the importance of
indicating to respondents that the results have been used and examples of their use.

Finally, researchers who conduct the survey want to know the results. Often, the
researcher’s interests include furthering the science. This may involve: (a) developing
scales/approaches to measure constructs, (b) testing hypotheses, and (c) building theories and
models. Researchers should not take it for granted that sponsors have that same interest in
furthering the science. Also, they should not assume that sponsors will want to know (or will
understand) results presented in the language and format of scientific journals.

Conclusions

Respondents, sponsors, and researchers are all stakeholders in surveys. All want to know
the survey results. However, they each may have unique, need-driven expectations for how
results are analyzed, reported, and utilized. To maximize utilization of results, researchers need
to recognize these differing needs by tailoring analyses and report formats accordingly.

1
The views expressed in this paper are those of the authors and do not necessarily reflect the views of the U.S. Army
Research Institute or the Department of the Army.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
575

Using Survey and Interview Data: An Example 1


Dr. Susann Nourizadeh and Dr. Alma G. Steinberg
U.S. Army Research Institute for the Behavioral and Social Sciences
5001 Eisenhower Avenue
Alexandria, VA 22333-5600
nourizadehs@ari.army.mil

Introduction

In general, surveys are useful in providing quantitative data from large samples across a
wide geographical area. They allow the same questions to be asked to all respondents in the same
way. Supplementing surveys with focus group interviews adds qualitative data that provide a
context to interpret survey results. Focus groups also facilitate an in-depth examination of issues
that are difficult to examine with surveys (e.g., why individuals responded a certain way to the
survey) and aid in finding solutions to problems. The purpose of this paper is to present an
example which uses survey and interview data to address an applied concern related to
mentoring.

Currently, the U.S. Army is examining issues related to mentoring, including its
definition, its role in leader development, and ways to increase its occurrence. However, there is
no shared understanding about the meaning of mentoring and how it should be implemented in
the Army. In the literature, the mentoring relationship is described as a “developmental
relationship between senior and junior individuals in organizations” (McManus & Russell, 1997,
p. 145). Thus, mentors are considered to be individuals who are superior in both rank and
experience (e.g., Bagnal, Pence, & Meriwether, 1985; McManus & Russell, 1997). In the U.S.
Army, currently there is confusion over the differentiation between leadership and mentoring.
The problem arises because similar behaviors appear to apply to both (e.g., teaching job skills,
giving feedback on job performance, providing support and encouragement).

In addition to definitional concerns, the Army is looking at how mentoring can be


implemented as part of leader development and whether the amount of mentoring can be
increased by tapping additional sources of mentors. Since Soldiers often learn much from the
first non-commissioned officers (NCOs) with whom they work (e.g., their platoon sergeant) and
also from their peers, the Army decided to examine whether these two groups might be
considered additional sources of mentors.

Approach

The survey data collection instrument was the Fall 2001 Sample Survey of Military
Personnel (SSMP). The SSMP is a semi-annual omnibus survey conducted by the U.S. Army
Research Institute. It is sent to an Army-wide random sample of Active component
commissioned officers and enlisted personnel. The survey addressed whether Soldiers felt they
ever had a mentor, who the mentor was (e.g., their rater, senior rater, someone else who was

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
576

higher in rank other than the rater or senior rater, peer, subordinate) and the kind of behaviors
their mentor exhibited.

The focus group interviews addressed why Soldiers perceived certain people to be their
mentors and not others, the behaviors they were looking for from mentors, and barriers that
prevented individuals from being seen as mentors. In addition, the focus groups addressed
participants’ views on how to increase the amount of mentoring in the Army.

Results

The survey results from 2,802 Army officers and 4,022 enlisted Soldiers are shown in
Table 1. The percentages in this Table are for the set of respondents who said that they have a
mentor now or have had one in the past. Of those who had a mentor now or have had one in the
past, most (92% of the officers and 86% of the enlisted) reported that the mentor was someone
higher in rank than them. In the case of both officers and enlisted Soldiers, the mentor was more
likely to be someone higher in rank, but not their rater or their senior rater. Also as can be seen
from the Table, only 12% of officers and 9% of enlisted Soldiers reported that their senior rater
was their mentor.

Very few officers and enlisted Soldiers reported that their mentor was a peer at their same
rank (3% and 5%, respectively) and very few said that their mentor was a person lower in rank
(3% of officers and less than 1% of enlisted Soldiers). Few officers and enlisted Soldiers (2%
and 9%, respectively) said their mentor was not in the military at the time the mentoring was
provided.

Table 2 shows the percentage of individuals who said their mentors exhibited various
behaviors and that each of these mentoring behaviors were very/extremely helpful. The Table
also shows that when mentors who are higher in rank than the mentee (raters, senior raters, and
others at higher ranks) exhibit some behaviors, these behaviors are somewhat more likely to be
seen as helpful than when mentors who are lower in rank exhibit these same behaviors. Some
examples of this include teaching job skills, helping to develop skills for future assignments,
providing support and encouragement, assigning challenging tasks, providing
sponsorship/contacts to advance careers, and assisting in obtaining future assignments. In
addition, senior raters are seen as more helpful than raters when they exhibit some behaviors
such as advice on organizational politics, personal and social guidance, sponsorship/contacts to
advance careers, and assistance in obtaining future assignments.

Focus group participants strongly advocated that mentoring as part of the leader
development process remain voluntary and not be mandated or assigned. In addition, Soldiers
clarified issues surrounding the perceived overlap between leadership and mentoring. It appears
that both share many behaviors in common, however, mentoring is seen as a more
individualized, one-on-one relationship wherein their mentors exhibit a broad range of mentoring
behaviors (as opposed to a few selected ones).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
577

Conclusions and Recommendations

This paper shows how survey and focus group methodologies complement one another
and result in richer findings, thereby increasing the potential for utilization. Thus, from the
survey it was found that a surprisingly low number of senior raters were seen as mentors and the
interviews revealed the obstacles that stood in the way. A major obstacle is Soldier reluctance to
seek help from senior raters because of their role in determining the final performance rating. On
the other hand, superiors not in the rating chain were seen as a good source of mentors since they
are in a far less threatening role. The focus group interviews also helped to explain why so few
survey respondents saw peers and subordinates as mentors. Although Soldiers do learn from
peers and subordinates, they do not experience the full range of mentoring behaviors in their
relationships with them.

The above findings led to the following recommendations which were provided to the
Army:

• Encourage mentoring, but keep it voluntary.

• Encourage senior raters to exhibit a wider range of mentoring behaviors.

• Make senior raters aware of the barriers to their being seen as mentors (i.e., limited
contact, their role in providing the final performance appraisal rating). Highlight
possible ways of overcoming these barriers.

• Encourage more non-raters who are senior in rank to mentor and educate them on the
importance of exhibiting a wider range of mentoring behaviors.

• Do not rely on increasing the number of mentors by encouraging peers and


subordinates to mentor. This is not likely to increase mentoring because few of these
individuals are typically viewed as mentors and are less likely to exhibit the whole
range of mentoring behaviors.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
578

Table 1

Percent of officers and enlisted soldiers who said their mentor is/was: a

Officers Enlisted Soldiers

Their rater 35% 23%

Their senior rater 12% 9%

A person who is/was higher in rank than them, 45% 54%


but not their rater or their senior rater

A person who is/was at their same rank 3% 5%

A person who is/was lower in rank than them 3% <1%

A person who is not or was not in the military at 2% 9%


the time the mentoring was provided
a
These percentages are for the set of respondents who said they have a mentor now or have had one in the past.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
579
Table 2
Percent of officers and enlisted soldiers who said their mentors (who were senior, peer, lower ranked, or not in the Army)
exhibited these behaviors and these behaviors were very/extremely helpful: a

Mentor’s relative position to mentee

Senior Higher rank, not Same Lower Not in


Rater rater rater/senior rater rank rank military
Demonstrates trust 93% 91% 94% …. 93% 93% 87%
Gives feedback on your job performance 90% 90% 88% 88% 81% 73%
Acts as a role model 89% 89% 92% 88% 90% 83%
Helps develop your skills/competencies for future assignments 88% 90% 88% 90% 77% 81%
Assigns challenging tasks 88% 90% 85% 81% 55% 78%
Provides support and encouragement 88% 87% 91% 93% 75% 87%
Instills Army values 87% 90% 89% 88% 84% 67%
Provides career guidance 86% 89% 89% 88% 77% 81%
Provides moral/ethical guidance 86% 88% 88% 91% 87% 83%
Teaches job skills 84% 84% 85% 80% 76% 72%
Protects you 82% 85% 81% 75% 79% 79%
Invites you to observe activities at his/her level 82% 78% 81% 87% 73% 79%
Teaches/advises on organizational politics 81% 87% 84% 80% 73% 74%
Provides personal and social guidance 77% 83% 85% 93% 74% 85%
Provides sponsorship/contacts to advance your career 75% 82% 78% 86% 65% 70%
Assists in obtaining future assignments 71% 80% 74% 74% 62% 72%
a
These percentages are for the set of respondents who said they have a mentor now or have had one in the past.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
580

References

Bagnal, C. W., Pence, E. C., & Meriwether, T. N. (1985). Leaders as mentors. Military Review,
65(7).

McManus, S. E., & Russell, J. E. A. (1997). New directions for mentoring research: An
examination of related constructs. Journal of Vocational Behavior, 51.

1
The views expressed in this paper are those of the authors and do not necessarily reflect the views of the U.S. Army
Research Institute or the Department of the Army.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
581

UTILIZING SURVEY RESULTS OF THE NAVY EQUAL


OPPORTUNITY/SEXUAL HARASSMENT SURVEY47

Paul Rosenfeld, Ph.D., and Carol E. Newell, M.A.


Navy Personnel Research, Studies, and Technology Department
5720 Integrity Drive, Millington, TN, USA 38055
paul.rosenfeld@navy.mil carol.newell@navy.mil

Commander Leanne Braddock


Navy Equal Opportunity Office
Navy Personnel Command
7736 Kittyhawk, Millington, TN, USA 38055
leanne.braddock@navy.mil

In 1988, the Chief of Naval Operations Study Group (CNO, 1988) conducted a wide-
ranging assessment of equal opportunity (EO) issues in the Navy. They found that no Navy-
wide instrument existed to accurately measure the EO climate of the Navy and tasked a
“comprehensive Navy-wide biennial EO climate survey to indicate the extent and form of racial
discrimination within the Navy” (p. 2-18). The previous year, the Progress of Women in the
Navy Report (Secretary of the Navy, 1987) had similarly found that no instrument existed to
determine the extent of sexual harassment (SH) in the Navy and recommended that a Navy-wide
SH survey be conducted. These two recommendations for Navy-wide surveys were implemented
in 1989 when the first Navy Equal Opportunity/Sexual Harassment (NEOSH) Survey was
administered. Since 1989, the NEOSH Survey has been administered every other year with the
results being briefed to senior Navy policymakers. The results of the NEOSH Surveys have
provided Navy leaders with an accurate portrait of the state of EO, SH and related issues such as
racial/ethnic, religious, and gender discrimination.
Although the NEOSH Survey results are generally widely distributed to top Navy
policymakers, questions are periodically raised about how the data are used and what impact
they have. The present paper describes how the NEOSH Survey results have been utilized by the
Navy and offers recommendations for how they could be better utilized.

INTERNAL USES OF NEOSH SURVEY RESULTS

Once analyzed, the NEOSH results are typically briefed to top Navy policymakers
including the Chief of Naval Personnel. Afterwards, the results are released, usually
accompanied by a Navy-wide message that summarizes the main findings and recommends

47
The opinions expressed are those of the authors. They are not official and do not represent the views of the U.S.
Navy.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
582

continued vigilance and efforts to reduce remaining racial and gender-related issues found on the
survey. Additionally, the NEOSH Survey results have been incorporated into standard Navy EO
and SH training, provided norms for command-level climate assessment surveys, led to
supplemental studies into selected EO/SH topics, and have been used to justify a recent Navy-
wide strategic diversity effort.

Following the Tailhook SH episode in 1991, the entire Navy was required to participate
in an eight-hour SH training stand-down. As part of the standard Navy training package for the
stand-down, SH results from the NEOSH Survey were included. More recently, some of the
NEOSH Survey results have been included within the Navy’s annual required General Military
Training (GMT). In the current GMT, Unit 3, Topic 1 deals with SH and EO. Within that
module, the SH results of the 1999-2000 NEOSH Survey are summarized.

Another internal Navy use of the NEOSH Survey results has been through Navy-wide
norms that are used in conjunction with command-level EO climate assessments. For the past
decade, the Navy has had a software program called CATSYS or CATWIN that allowed
commands to administer and analyze a command EO survey online without the requirement for
additional software or specialized training. The standard command EO survey was essentially a
“mini-NEOSH” of around 40-50 items that commands could administer and modify as needed.
Early users of the command-level system requested some comparison data so they could see how
their local performance compared to that of the Navy. To address this issue, Navy-wide norms
generated from the NEOSH Survey have been calculated and posted on a Navy-web site
allowing local commands to compare their survey findings to Navy-wide averages.

Results from the NEOSH Survey have also led to follow-on research studies to better
explore issues raised in the survey. These special studies were conducted to gain a more in depth
understanding of an issue. Topics that have been addressed include focus groups with African-
American women on their experience in the Navy (Bureau of Naval Personnel, 1990), and
reasons for survey non-response on the NEOSH Survey (Newell, Rosenfeld, Harris, &
Hindelang, 2003). The focus group study sought to determine reasons for African-American
women’s low EO climate scores on the NEOSH Survey. This study was conducted in 1990 and
again in 1994 (Moore & Webb, 1998), as the finding that African-American women were the
least satisfied of any group was still obtained on the survey. The non-response study was
conducted as a result of the declining response rates evident on the NEOSH and other Navy-wide
surveys (Newell et al., 2003). As a result of this study, the 2002 NEOSH Survey was shortened
and steps were taken to provide better feedback as these were common complaints among those
completing the non-response survey (Newell et al., 2003).

More recently, the Navy has developed a strategic diversity initiative that seeks to move
the Navy from an older EO/compliance framework to a model that values differences and seeks
to leverage diversity to maximize performance and increase the Navy’s readiness. In justifying
the need for the Navy to move beyond its traditional EO programs, the project leaders used a
long-term NEOSH finding that although improvements in EO climate have occurred since the
first NEOSH Survey administration, racial and gender gaps still remained in many of the areas

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
583

assessed by the survey. This notion of “improvement but gaps remain” provided a compelling
argument that if improvement were to continue with the gaps being reduced, a new paradigm
would be required, in this case, the one offered by a strategic diversity effort.

EXTERNAL USE OF NEOSH SURVEY RESULTS

Externally, the NEOSH results have been used to respond to taskers from the Department
of Defense and been included in Congressional testimony by top Navy leadership. Perhaps due
to the interest in SH issues in the Navy in the aftermath of Tailhook, the Congressional testimony
has typically focused on the SH side of the NEOSH Survey and documented the reduction in SH
rates that occurred in the Navy in the years following the 1991 Tailhook episode. On February
4, 1997, Secretary of the Navy, John H. Dalton cited NEOSH survey results in testimony before
the Senate Armed Services Committee. He said, “The Navy Equal Opportunity and Sexual
Harassment, and the Marine Corps Equal Opportunity Survey are key assessment tools,
providing us biennially with a comprehensive look at our equal opportunity climate. A
comparison of the NEOSH survey results conducted in 1989, 1991, 1993, and 1995 indicates that
the Navy has made steady progress in the communication, prevention, training, and handling of
sexual harassment complaints”. In Congressional testimony before the House National Security
Committee Subcommittee on Personnel on March 13, 1997, Vice Admiral Dan Oliver, Chief of
Naval Personnel noted that “both our 1995 Navy Equal Opportunity/Sexual Harassment
(NEOSH) Survey and a 1996 DoD wide survey revealed a generally improving trend in
elimination of sexual harassment in the Navy, and Service leadership in training, communication
and reporting sexual harassment”. In testimony before the Senate Armed Services Personnel
Subcommittee on March 24, 1999, the Honorable Carolyn H. Becraft, the Assistant Secretary of
the Navy (Manpower and Reserve Affairs) echoed similar thoughts: “…we monitor our progress
in preventing sexual harassment, fraternization, and related behaviors through a number of
assessment tools. The Navy and Marine Corps each conduct their own biennial surveys—The
Navy Equal Opportunity Sexual Harassment (NEOSH) and the Marine Corps Equal Opportunity
Survey (MCEOS) – that provide a climate assessment on various equal opportunity issues.
Current results indicate that we have made progress and are moving in the right direction, but
that we must not relax our resolve to rid our Services of sexual harassment and other
unacceptable behaviors”. At that same Senate hearing, VADM Dan Oliver, the Chief of Naval
Personnel, noted the downward trend in SH rates reported on the 1997 NEOSH Survey with the
largest decreases being in hostile environment forms of SH. Similarly, in testimony before the
Senate Armed Services Committee Subcommittee on Personnel on March 9, 2000, Vice Admiral
Norb Ryan, the Chief of Naval Personnel, noted that the NEOSH Survey was part of the Navy’s
program of SH prevention. In sum, the NEOSH Survey has been regularly used to inform

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
584

Congressional committees that the Navy is actively monitoring its EO climate and SH rates, and
also to demonstrate through these survey results that progress has been made.

ARE NEOSH RESULTS UNDERUTILIZED?

Although the NEOSH results have had both internal and external uses, it is probably
accurate to say that they have been underutilized. Indeed, in a study that contacted individuals
who had not responded to the 1999 NEOSH Survey, Newell et al. (2003), found that one of the
major reasons that individuals cited for not responding to surveys like the NEOSH was that they
felt that no changes would result or that their results didn’t matter. As noted in Alma Steinberg’s
and Susan Nourizadeh’s paper in this symposium, part of this dissatisfaction may be the result of
a divergence in expectations between survey respondents and survey sponsors and policymakers
about how survey data are to be utilized. Survey respondents tend to personalize the uses of
survey data and often see utilization in terms of tangible changes that may impact them. While
this may occur in small organizations following climate assessments, this sort of dramatic
survey-driven change is rare in large institutions such as the military. One notable exception is
the Marine Corps. In 1993, the Marine Corps conducted a service-wide Quality of Life Domain
Survey that indicated dissatisfaction with housing. Based on these results, the Marine Corps was
able to get additional funding for housing. A follow-up survey in 1998 indicated a significant
positive trend in perceptions of housing satisfaction, providing some evidence that the increase in
funding for housing was successful.

More commonly, military organizations use large-scale surveys such as the NEOSH as a
benchmark for how they are doing and to determine whether they have improved compared to
the past. It is rarer to make large-scale changes such as those that followed the Marine Corps
Quality of Life Survey and then use a future survey administration to evaluate the efficacy of the
change. Since respondents expect change but leaders utilize surveys for purposes other than
organizational change, we recommend that steps to reduce this divergence be taken.
Specifically, we recommend better feedback to survey respondents about survey results and uses,
and some limited but targeted actions based on survey results.

BETTER UTILIZATION OF NEOSH SURVEY RESULTS

Feedback to Respondents

Although the NEOSH Survey results have been used, one obvious limitation is that
individuals within the sample have not been given feedback on what the results were or how they
have been used. The Navy has recognized this limitation and is currently requiring that all
approved personnel surveys include a plan through which survey respondents would be provided
feedback about the results. This typically occurs through follow-up letters to all who were in the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
585

original sample providing them with a summary of the results or directing them to a website
where the results are available. While the letters typically have provided a summary of the major
findings, there is no reason why in future efforts they could not also include a description of what
actions the Navy was planning based on the findings. As the Navy moves its personnel surveys
to the Internet, this feedback function can increasingly be done electronically thus shortening the
lag time between administration and results. The feedback letter would also inform respondents
of how their responses have been used. While specific organizational changes may not always
occur, telling respondents that their results have better informed Navy, DoD or Congressional
leadership or had been used to update Navy-wide training, would certainly go a long way to
closing the gap between respondent expectations and the actual organizational uses of the data.

Targeted Actions

While the Navy has not typically used NEOSH Survey results to attempt large-scale
organizational changes, this has occasionally occurred in a more limited fashion. For example,
the results of the 1999/2000 NEOSH Survey indicated that just about half of respondents had
seen or heard of the Navy EO/SH adviceline. The Navy had instituted the adviceline following
Tailhook to provide impartial information to callers on EO and SH issues. After the results were
briefed, the NEOSH Survey sponsors made a concerted and coordinated effort to increase the
visibility of the adviceline through posters and other media efforts. The results of the 2002
NEOSH Survey demonstrated that these efforts had been successful. While in 1999, just over
half of officers and enlisted personnel had heard of the Navy EO/SH adviceline, the percentage
who said they had heard of the adviceline had jumped to over 2/3 on the 2002 survey.

A more systematic attempt for targeted actions based on NEOSH results is currently
being proposed in conjunction with the communications plan for the release of the 2002 NEOSH
Survey results. That plan would target another long-held NEOSH finding relating to racial and
gender discrimination: that the most common occurrences of racial and gender discrimination
are in the areas of “offensive comments and jokes”, still reported by about 1/3 of enlisted
minorities and women. The proposed action would target offensive jokes and comments since
they are both the most common forms of reported discrimination and also because Sailors can
take simple actions to end these forms of discrimination. This message of “simple actions to end
offensive jokes and comments” will be conveyed through various Navy media including
websites, wire stories, and internal Navy television news stories and commercials. As currently
proposed, the success of the targeted efforts at reducing these forms of discrimination will be
assessed in future surveys; either through a Navy-wide survey administration or through a more
limited but scientific Internet quick poll that would focus on the specific behaviors being
targeted. This proposed, coordinated media and communications strategy followed by planned
follow-up assessments has rarely if ever been used to attempt to effect and assess change
following Navy surveys. Thus, the effort should be viewed as a pilot project that, if successful,
may serve as a model for future efforts to better utilize the results of Navy surveys.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
586

REFERENCES

Bureau of Naval Personnel (1990, November). Black women in the Navy study group report.
Washington, DC: Author.

CNO Study Group (1988). CNO study group’s report on equal opportunity in the Navy.
Washington, DC: Department of the Navy.

Moore, B.L. & Webb, S.C. (1998). Equal opportunity in the U.S. Navy: Perceptions of African-
American women, Gender Issues, 16(3), 99-119.

Newell, C.N., Rosenfeld, P., Harris, R.L., and Hindelang, R.N. (2003). Reasons for non-
response on U.S. Navy surveys: A closer look. Manuscript submitted for publication,
Military Psychology.

Secretary of the Navy. (1987, December 5). Navy study group report on progress of women in
the Navy. Washington, DC: Department of the Navy.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
587

THE U.S. ARMY'S PERSONNEL REPLACEMENT SYSTEM


Raymond O. Waldköetter, Ed.D. Alex T. Arlington, M.P.A.
Educational and Selection Programs U.S. Army Soldier Support Institute
Greenwood, IN 46143 U.S.A. Fort Jackson, SC 29207 U.S.A.
Staff2@officeassistanceandsuuulies.com
The views expressed in this paper are those of the authors and do not necessarily reflect the views of the U.S.
Army Soldier Support Institute, Department of the Army, or Department of Defense.

Replacements and Manning

The principal goal of the Army's personnel replacement system is to fill needs or
requirements, putting trained and confident soldiers onto the battlefield as quickly as
possible. Replacements are crucial to our national security, and there is greater stress on
the active force of the U.S. Army now less than 500,000 personnel. During war
personnel replacements are needed to fill combat, combat support, and combat service
support positions. Should a protracted war develop it would not be long before
commanders ask for needed replacements. The ability will be essential for the
replacement system to fill requirements on a timely basis (Arlington, 1998).

This paper examines personnel replacement operations and the automated


systems used to ensure that the right soldiers with the right skills get to the battlefield at
the right time. Knowing the number of soldiers for anticipated missions and casualty
estimation are critical factors in determining the personnel replacement needs.

Personnel replacements fall into one of two categories. The first category is
called the filler requisition shelf and these personnel fill the gap between peacetime
authorized strength and wartime required strength. This shelf is updated at least once a
year and it reflects any changes to the Table of Organization and Equipment (TOE) and
the Table of Distribution and Allowances (TDA). The second category of replacements
is based on the anticipated number of casualties (AR 600-8-111, 1996).

Casualty estimation and casualty stratification are extremely important to the


Army and are two of the keys to successful replacement operations (Arlington &
Waldköetter, 1994). These procedures are also very controversial in that some experts
believe we should base our casualty estimation and stratification on historical rates and
others believe we should use rates generated from computer simulation models. The
Army formally used a combination of the two procedures to estimate casualties.
However, in 1997 Major Army Commands (MACOMs) were given a new method to
estimate casualties. The former approach used the following five levels to describe
combat intensity: intense, heavy, moderate, light and none. The new methodology no
longer uses these static definitions, because research has shown that casualties often
occur in pulses, and with the new approach personnel planners are allowed to choose
rate patterns instead of combat intensity levels (Kuhn, 1998), and the chosen rate pattern
will be based on the type of combat mission.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
588

To plan for future missions properly the commander needs to know the number of
soldiers he will have and the skills they possess. This concept is called manning and is one of
the most critical elements of war. The ultimate aim of a replacement system is to continue to
man the force. The procedures used to achieve this aim have varied throughout history (FM
12-6, 1994).

Differing Operational Approaches

The German divisions during World War II (WW II) were affiliated with a
military district with the regiments within a division affiliated with a region within the
military district. Replacement battalions were located within a region intending to have
regimental replacements coming from the same German regions. This was very effective
during the early stages of the war, but as the number of German divisions grew it became
necessary to have replacement battalions provide replacements for divisions instead of
regiments. The replacement battalions received the draftees and provided about two
months of combat training, then after this initial training the replacements were either
sent to the division to a replacement battalion or to receive additional training.

The German division and regimental commanders were in charge of their


replacement system rather than the Army command. This decentralized system fostered
esprit de corps and great unit devotion, however, when certain units had mass casualties
they had to be taken off the front lines, since not enough replacements were available
from the affiliated region or military district. When this occurred the unit was sent to a
recovery area behind the front lines, remaining there until enough sick and wounded
soldiers returned or region and district draftees were assigned to the unit. Another
approach used by the Germans was to take the remnants of a division and create new
battalions. These battalions would then become part of another division, the regional
integrity of the battalions being kept intact. Along with reducing the numbers of
divisional battalions from the original nine to seven, these approaches allowed
Germany to maintain nearly 300 divisions until the later stages of WW II.

The British replacement system during WW II was similar to the German system
in that it tried to keep regional ties whenever possible. The U.S. Army took a differing, if
not opposite approach, to replacement operations, marking considerations as to how and
why another approach was more acceptable. There was fear that if a particular unit lost a
large number of men, it would have a dramatically negative impact on the regional
morale. Another difference in replacement philosophy was the decision to keep the
number of divisions relatively low. During WW II Henry L. Stimson, then Secretary of
War, wanted the Army to have 200 divisions, whereas General George C. Marshall, the
leading U.S. Army proponent, insisted on keeping the number much lower so that these
would be an adequate replacement flow. Secretary Stimson ultimately gave way to
General Marshall and they agreed that 90 divisions for the Army would be a manageable
number. It was also thought a centralized system would be more efficient and as needs
occur fill them quickly as possible without concern for trying to keep regional integrity.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
589

This policy was so strictly adhered to that many of the replacements assigned to the U.S.
National Guard units were not even from the related state (Wray, 1987).

Centralized Replacement Operations

Today, the centralized philosophy of replacement operations is intact.


Headquarters, Department of the Army (HQDA), Deputy Chief of Staff for Personnel
(DCSPER) is the Army's functional proponent for the replacement management system.
After the Office of the Deputy Chief of Staff for Operations and Plans (ODCSOPS)
determines that replacements will be necessary it notifies the DCSPER. The DCSPER
then relaying this information notifies the Training and Doctrine Command
(TRADOC), executive agent for replacement centers, of the necessary replacements to
be subsequently processed by the designated replacement centers. The Continental U.S.
(CONUS) Replacement Centers (CRCs) are located at predesignated Army
installations. Operations begin 10 days before the first replacements are expected to
arrive, when respective CRCs have the responsibility to receive and-in process
replacement personnel, subsequently to coordinate training, equipping, and the
transportation of replacement soldiers (AR 600-8-111, 1996; FM 12-6,1994).

The CRC consists of a Replacement Battalion and two to six Replacement


Companies. The Battalion Headquarters is responsible for providing command and
control for the battalion and Replacement Companies. The Battalion Headquarters is
commanded by a Lieutenant Colonel and has 38 personnel, excluding the companies.
Each company commanded by a Captain has 25 personnel with each company having
up to four platoons and each platoon having up to 100 replacements. The goal of the
Replacement Company is for 100 ready to depart each day allowing five days total
processing time per each CRC replacement (FM 12-6, 1994).

The Replacement Operations Automated Management System (ROAMS) is the


computer program used by the U.S. Total Army Personnel Command (PERSCOM) to
track the flow of CRC replacements to the theater of operation, serving PERSCOM to
use ROAMs to both project and manage replacements (AR 600-8-111, 1996).

Challenges or Issues for Personnel Replacement Systems

The replacement system will face most probably at least three specific
problems in the future, and the first being our reliance on technology. One of the
reasons we have been able to reduce the size of our force is the superiority we possess
in military technology. Technology is great when everything works properly, but if a
system fails and there is not a backup system we are vulnerable. The enemy also
knows we rely on technology and will use whatever means available to degrade our
systems. Immediate dangers that we face are computer viruses, computer hackers and
terrorism. Additional problems we may face in the near future are threats of long-
range precision bombs and smart munitions. Also in the future is the potential for
destruction for our computer systems and satellites through use of electro-magnetic

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
590

pulse (EMP). Our reliance on technology will not diminish, therefore, it is imperative
that we have hardened systems with backup modules, computer firewalls, and anti-
virus software.

The second problem facing the replacement system is the lack of reliable models for
casualty estimation and stratification. There is currently no validated computer simulations
model that generates casualties for Combat Support (CS) and Combat Service Support (CSS).
We are now a force projection Army and there is more reliance on CS and CSS than ever
before where our enemies are constantly striving to find ways to reduce our advantages by
disrupting CS and CSS. The emphasis the Center for Army Analysis (CAA) places on casualty
estimation and stratification for combat personnel must be applied to CS and CSS personnel.

Another short coming of casualty estimation is the restrictive nature of the


models. The models currently run simulations to estimate casualties only at the division
level or higher, yet our force is tending to get smaller, more lethal and more
maneuverable. As we have experienced in the 2003 Iraqi War, many battles will be
fought at the maneuver brigade level and below, requiring we must be able to estimate
casualties at those levels. Further current models do not have sufficient capability to
simulate weapons of mass destruction (WMD), operations other-than-war (OOTW) or
special operations and primarily concentrate only on conventional losses in conventional
operations. Since we live in an unpredictable world rather than conventional, casualty
estimation models must be flexible enough to allow planners the ability to simulate
casualties in many different environments.

The third problem now facing the Army's replacement system is the lack of
personnel with something less than 500,000 personnel on active duty. Congress has been
informed by the Department of Defense and senior Army leadership that everything is
fine, as this number is acceptable during a qualified enforcing of "peacetime." However,
getting into a protracted conflict or in two nearly simultaneous theaters of war there are
currently insufficient personnel. It would then be necessary to activate the reserve
components where we have over half of our combat forces. Fully activating the National
Guard and Army Reserve would require the President to convince Congress and the
American people that our way of life and security are drastically threatened. This
problem actually goes beyond current numbers as presented below in Table 1. If
Congress directed the Army to set peacetime active duty numbers at 700,000 personnel,
the Army would likely fall short since present recruiting goals are barely met (Arlington,
1998).

Table 1
U.S. Army Total Force 2004
Active Component (AC) Army National Guard (ANG) Army Reserve (AR)
480,000 + 350,000 + 205,000 =
(Force Structure 1,035,000)*
*Army Divisions 18 (10AC, 8 ARNG)

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
591

The Army has over 180,000 soldiers in more than 80 countries. Many part of a
conventional force with many others involved in special operations and fighting the
global war on terrorism. To maintain our current advantages the Army is constantly
searching for ways to modernize. According to the Army Modernization Plan (Annex
B, 2003), 98% of the $10.76 billion Science and Technology (S&T) funding is
specifically targeted for future forces (DCS, 2003).

In conclusion, the Army cannot really do anything immediately about the lack of
personnel, but something can be done about the preceding two challenges or issues and the
specific constraints or problems regarding its technology and systems and models for casualty
estimation and stratification. Improving, protecting and safeguarding the automated systems
used to call up replacement personnel should be the number one priority. The second priority
should be to improve the casualty estimation and stratification models. There is little doubt that
the Army replacement system works effectively in a relative peacetime situation of limited
combat. Hopefully the options or suggestions now proposed for improving personnel
replacement operations and systems will never have to be tested in a real scenario of global
multi-front conflict.

References

Arlington, A.T., & Waldköetter, R.O. (1994). A method for estimating Army battle
casualties and predicting personnel replacements. Paper presented at the 36th Annual
Conference of the International Military Testing Association, Rotterdam, the
Netherlands.

Arlington, A.T. (1998, November). The Army's personnel replacement system.


Unpublished manuscript. Fort Belvior, VA: U.S. Army Management Staff College.

Deputy Chief of Staff (G-8). (2003, February). Army modernization plan (Letter).
Washington DC: Headquarters, Department of the Army.
Kuhn, G.W. (1998, January). Battle casualty rate patterns for conventional ground
forces, rate planners guide. Washington, DC: Logistics Management Institute.
Personnel Doctrine (Field Manual 12-6). (1994). Washington, DC: Headquarters,
Department of the Army.

The Army Modernization, Annex B, (2003). On point for readiness today, transforming
for security tomorrow.
Retrieved from http://www.army.mil/features/MODPLAN/2003/default.htm.

Wartime Replacement Operations. (Army Regulation 600-8-111). (1996). Washington,


DC: Headquarters, Department of the Army.

Wray, J.D. (1987, May) Replacements back on the road at last. Military Review.
Retrieved from http://leav-err.army.mil-cgi-bin/cgcqi.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
592

Team Effectiveness and Boundary Management:


The Four Roles Preconized by Ancona Revisited
Prof. Dr MSc Jacques Mylle
Psychology Department
Royal Military Academy
B1000 Brussels – Belgium
jacques.mylle@rma.ac.be

Background

A lot of tasks during peace support operations (PSO) have to be performed by small military
units (say from 4 to 10 people). It is often contended that these tasks are characterized by the
necessity of teamwork and a high level of autonomy, due among others to the large distances
between the (sub)unit executing a task and their superior at the one side, and the potentially
quickly evolving situation at the other side .

The issue addressed in this paper is to be situated in the framework of a study aiming at
measuring team effectiveness of small executive teams, more specifically the contribution of
autonomy and boundary management in turbulent situations.

By teams we mean groups of people working together on a task with a high interdependency
of team members in fulfilling their job and who work together for a longer time period in their
real work environment. Thus we do not consider one-time groups in laboratory situations.

The scope on the subject is a so-called external perspective or ecological perspective, because
the team is considered as a living system that, at the one side, adapts to the demands of its
environment but, at the other hand, causes changes in the environment too.

Until the mid eighties researchers took an internal perspective and focused on what was
happening inside the group; e.g. how cohesion evolves.
The seminal work of Gladstein (1984) was the start for a paradigm shift: studying team
behaviors directed outwards the team; among others, towards other parts of the organization
and other groups evolving in the same setting. The core question relates to how the team
deals with the external influences on their performance and how they (try to) influence the
outer world themselves. In other words we are looking at what happens on the boundaries of
the team.

Gladstein observed in interviews that the subjects – as members of sales teams – frequently
spoke about the importance of their interactions with other teams of the company, such as the
installation teams and repair teams.
Another important finding was that they did not distinguish between the classic task-related
processes and team-related behaviors but instead between internal and external oriented.
External oriented behaviors are for example seeking information or molding an external
opinion.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
593

Since 1990 a lot of work has been done by Deborah Ancona (often in collaboration with D
Caldwell). Her initial longitudinal study led to four key findings:
1. teams develop a distinct set of externally oriented activities and strategies
2. these activities are positively and significantly related to managerial performance
ratings
3. there exists a complex interaction between the internal and external processes and it
changes over time
4. there is a pattern in this external dynamic just as there is one in the internal dynamic.

An exploratory factor analysis of data she collected in consulting and production teams
revealed the existence of three major “styles”.
1. The Ambassador style, which includes both buffering and representing. Buffering
means protecting the team from external influences or absorbing them while
representing refers to persuading other to support the team in its efforts. In later work
(for a review see Yan and Louis, 1999) buffering will become a fourth separate role,
namely guarding.
Communication is thus bottom up oriented, deals with how to have access to power
and how to manage the vertical dependence
2. The Task co-ordination style deals with workflow and -structure, refers to how handle
technical or design issues through looking for feedback or negotiating.
In this case communication is lateral, how to manage horizontal dependence.
3. The Scouting style refers to scanning the environment for information/ideas about
relevant aspects of the “environment”; e.g. available resources (and the competition
for it), technologies, etc.

Furthermore, Ancona defined four strategies, which rely on the above described styles.
1. The ambassadorial strategy relies on the ambassadorial style only while the others
are neglected
2. The technical scouting strategy encompasses the scouting style and task co-ordination
but not the ambassadorial style.
3. The isolationist strategy refers to the absence of any style. The team lives more or
less on its own as on an island.
4. The comprehensive strategy is a combination of the ambassadorial style and task co-
ordination, with a minimum of scouting.

She showed also that the comprehensive style is the only effective one in turbulent situations.

It goes without saying that boundary management is an issue for leaders too, even if all or
some of its behaviors are shown by team members as boundary spanners.

Research question

Ancona tested her hypothesized structure of boundary management with data from consulting
team and new production teams in a commercial setting.
It is know that boundary management in more or less quickly changing situations is positively
related to performance in civilian settings under the condition that the team uses the right
“mix” of styles. Given that we arrived tentatively at the same conclusions in operational

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
594

context (Mylle, 2001; Mylle, Sips, Callaert & Bouwen, 2002), we wanted to verify if the
factor structure preconized by Ancona can be generalized to other types of teams and to other
settings? In casu, do effective military teams in a peace support operations environment use
indeed the four “roles” described by Ancona?

Method

Subjects
Data have been collected in a civilian sample and a military sample. The latter consists of
about 200 soldiers out of 800 (in rounded figures) who were part of a Belgian Task Force in
Kosovo. Furthermore, the sample encompasses a subsample of “combat troops” (infantry and
armor troops, n=143) and a subsample of “support troops” (pioneer, logistics, medical
support, n=47).

Instrument
Based on a simple (causal) model, a questionnaire has been elaborated by the research team to
measure several facets of team functioning; among others boundary management.
For this aspect, the questionnaire of Ancona (1993) was taken as such and was translated into
Dutch and French. It is composed of four scales totaling 25 items: the Ambassador-,
Coordinator-, Scout- and Guard scale have respectively 12, five, four and four items. An
example of an item belonging to each scale is given below. The core of all statements can be
found in Table 1.

Ambassador: We try to persuade others to support the team’s decisions


Co-coordinator We try to resolve problems together with other teams
Scout We negotiate deadlines with external people
Guard We keep information in the team secret for others

To get a better insight in what kind of external activities teams get involved and why they did
what they did, we asked a number of questions about “communication with people outside the
team” and “motives of external people for contacting the team”.
For example:

[We contact people outside the team] to take corrective actions as changing work
procedures or processes
[People outside the team contact us] to discuss or to have an extended exchange of
ideas

The questionnaire was submitted after an intense training period, 14 days before deployment
as part of the task force Belukos VI, which was deployed from the beginning of April 2001
until the beginning of August 2001 in Kosovo.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
595

Results

The internal consistency of the questionnaire as a whole is very good (.90) and at the scale
level ranges from good to poor; i.e. .86 for Ambassador, .50 for Coordination, .65 for
Scouting and .48 only for Guard.

First we checked the appropriateness of our data for factor analysis. A Kaiser-Maier-Olkin-
index of .88 and Bartletts’ sphericity index of 1496 (p=.000) show that our data are suited for
factor analysis.
We used a principal component analysis, asked for extraction of all factors with an eigenvalue
over 1 and a varimax rotation to obtain a structure which is easy to interpret.
The analysis based on the total sample revealed a four factor structure which could easily be
labeled in the same way as Ancona did, although with a somewhat different distribution of the
items over the factors.
The analysis on the military subsample resulted in a seven factor solution which we will not
discuss for obvious reasons. A second analysis, with a forced extraction of four factors,
yielded the following results.

Table 1 . Factor structure and factor loadings > .40 based on own factor analysis

Item Keywords FI F II F III FIV


number
19 Scan the environment for threats .722
17 Find out if others support or oppose .716
16 Procure things from others .712
18 Collect technical info .673
21 Scan the environment for .640
15 technology .580
20 Report progress .569
23 Control release of info .526
22 Search info about company’s .490
11 strategy .460 .409
Absorb outside pressure
Negotiate with others
1 Persuade others to support .795
3 decisions .715
2 Acquire resources .594
4 Review product design .588
5 Keep news secret until appropriate .408 .569
12 time .500 .420
24 Avoid releasing info to protect .403 .479 .401
image
Scan the environment for marketing
Talk up the team
7 Keep others informed .673
8 Co-ordinate activities with external .667
10 groups .605
6 Resolve design problems .524

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
596

14 Find out what others do in similar .445


projects
Prevent for info overload
5 Avoid releasing info to protect .733
25 image .614
9 Keep info secret for others .444 .594
Protect team for inference
Label Scout Ambas Co Guard
sador ordinat
or

The variance explained by these four factors is in total 52% (respectively, 19, 14, 11 and 9%).
Factor I totals 10 items and a content analysis of it shows that the six highest loadings can be
associated with scouting. Seven items load on Factor II and five of them refer to
ambassadorial activities. Factor III with five items is more ambiguous: only two of them are
clearly associated with coordination, although with high loadings. Finally, the three items of
factor IV refer clearly to the guard role.

The factor structure together with the factor loadings above .40 are given in Table 1. Thus,
based on the content we can label those factors in the same way as Ancona did; i.e.
Ambassador, Coordinator, Scout and Guard, but the composition of the scales differs in both
solutions.
The cross-tabulation in Table 2 shows which items are kept in the same scale and which items
moved to which scale.

Table 2. Comparison of the Ancona factor structure and the own structure

OWN DATA
Item F II F III FI F IV
(Ambassado (Coordinato (Scout) (Guard)
r) r)
FI 1 X . . .
Ambassador 3 X . . .
7 . .X . .
9 . . . X
13 . . . .
14 . X . .
15 . . X .
17 . . X .
19 . . X .
22 . . X .
ANCON 23 . . X .
24 X . . .

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
597

A F II 2 X . . .
Coordinat 8 . X . .
or 10 . X . .
11 . . X .
16 . . X .
F III 6 . X . .
Scout 12 X . . .
18 . . X .
21 . . X .
F IV 4 X . . .
Guard 5 X . . X
20 . . X .
25 . . . X

A factor analysis, based on the total sample, of the set of items encompassing Ancona’s
questionnaire, the communication- and the motives scales shows a six factor structure. The
items of the Communication scale and the Motive scale are not distributed over the four
Boundary Management scales. Thus, they are not instantiations of one of the four functions
but measure some separate entities. There exists nevertheless a significant correlation
between them: r(A48,C) = .524, r(A,M) = .458, and r(M,C) = 411.

Conclusion

The concept of boundary management is a necessary and fruitful approach to understand team
behaviors in turbulent situations and to explain what makes the difference between effective
and ineffective teams.
These behaviors can be grouped into four categories; each category of behaviors serving a
particular purpose: searching for information/means (Scout), promoting the team
(Ambassador), protecting the team (Guard), coordinating with other teams (Coordinator).
As a result of our data analysis we can conclude that the concept of boundary management
and its basic structure is validated but that the scales as elaborated by Ancona are not.
Finally, we need to refine our questionnaire to get rid of the ambiguities in three of the four
scales.

References

Ancona, D. (1990). Outward bound: Strategies for team survival in an organization. Academy
of Management Journal, 33 (2), 334-365.

Ancona, D. (1993). The classic and the contemporary: A new blend of small group theory.
In K. Murnighan (Ed), Social Psychology in organizations: Advances in theory and
research. Englewood Cliffs: Prentice Hall.

48
“A” stands for the complete boundary management scale elaborated by Ancona,; “C” refers to our
Communication scale and “M” to our scale Motivates for contacts by external people.
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
598

Gladstein, D. (1984). Groups in context: A model of task group effectiveness. Administrative


Science Quarterly, 29, 499-517.

Mylle, J. (2001). Perceived team effectiveness in peace support operations; a cross-sectional


analysis in a belgian task force. Proceedings of the 43rd Annual Meeting of the
International Military Testing Association, Canberra.

Mylle, J., Sips K., Callaert J., Bouwen R. (2002). Perceived team effectiveness: What makes
the difference? Proceedings of the 38th Annual International Applied Military
Psychology Symposium, Amsterdam.

Yan , A. & Louis, M.R. (1999). The migration of organizational functions to work unit level :
Buffering, spanning and bringing up boundaries. Human Relations, 52 (1), 25-47.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
599

Mental Health Literacy in the Australian Defence Force

Colonel A.J. Cotton,


Director of Mental Health
Australian Defence Force, Canberra, Australia
Anthony.Cotton@defence.gov.au

MS EMMA GORNEY
Directorate of Strategic Personnel Planning and Research
Department of Defence, Canberra, Australia
Emma.Gorney@defence.gov.au

Abstract

Cotton (2002)49 reported on the implementation of the Australian Defence Force (ADF)
Mental Health Strategy (MHS). One of the key principles underlying this was a focus on mental
health promotion. Mental health literacy is one of the key components of mental health
promotion and can be defined as the levels of understanding of mental health issues, including
treatment options, in a population. This paper reports on the first attempts of the ADF to
measure mental health literacy in its population. In particular, it will highlight what ADF
members perceive to be the mental health issues facing them and compare this with the
development of initiatives in the ADF MHS.

INTRODUCTION

Mental health is considered a major health issue in Australia, and is one of the country’s
top five National Health Priority Areas50. Quality of life surveys, in Australia as well as in most
other western societies, routinely show that mental health rates highly, if not highest, among the
issues of greatest concern to people. The Australian Defence Force (ADF) is not immune to the
pressures that face the general community so it is reasonable to assume that mental health is an
issue for the ADF. Add to this the unique demands of service life and a level of operational
tempo that has been steadily increasing over the past decade and it is reasonable to assume that
mental health is a major issue for the ADF.

The cost of the defined burden of mental health problems on the ADF is estimated to be
around $20m per annum (ADF, 2002)51. The undefined burden, being the impact on people other
than those directly affected, is more difficult to measure, but is characterised by ongoing family

49
Cotton, A.J. (2002), The Australian Defence Force Mental Health Strategy, paper presented at the 44th
Annual meeting of the International Military testing Association.
50
Australian Institute of Health and Welfare, Australia’s Health 1998, p103-108
51
Australian Defence Force (2002), Health Status Report, unpublished.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
600

problems for the member, reduced job performance (including discipline and morale problems in
the member' unit), and possible separation from the ADF. The hidden cost of mental health
problems is defined as the increase in mental health problems that occur as the result of
individuals not seeking adequate or early support for their problems due to the stigma attached to
mental health problems, again, calculating this for the ADF is very difficult.

Cotton (2002)52 reported on the implementation of the Australian Defence Force (ADF)
Mental Health Strategy (MHS). The development of this comprehensive strategy for the
delivery of mental health services in the ADF was a response to the ADF Health Status Report
2000. Among the requirements identified in this report was the need for appropriate mental
health indicators to help guide the provision of appropriate mental health promotion activities
and service delivery programs.

Mental health service delivery has typically focussed on risk factors, or those things that
predispose an individual to experience a mental health problem. Protective factors, however, are
in many cases more important for prevention interventions. Protective factors derive from all
domains of life: from the individual, family, community and wider environment, some are
internal, such as a person’s temperament or their intelligence, while others are related to social,
economic and environmental supports. Protective factors enable individuals to maintain their
emotional and social wellbeing and cope with life experiences and adversity. They can provide a
buffer against stress as well as a set of resources to draw upon to deal with stress.

The National Mental Health Strategy, 2000, presents protective factors that can reduce
the likelihood of mental health problems and mental disorders and mitigate the potentially
negative effects of risk factors. These are categorized as individual, family/social,
school/education, life events and situations, and, community and cultural factors. Protective
factors improve a person’s response to some environmental hazard resulting in an adaptive
outcome, one of the major protective factors consistently identified in the literature is the
building of resilience in individuals (Rutter, 1979)53.

The concept of resilience is central to most empirically based prevention programs.


Resilience describes the capacities within a person that promote positive outcomes, such as
mental health and wellbeing, and provide protection from factors that might otherwise place that
person at risk of adverse health outcomes. Factors that contribute to resilience include personal
coping skills and strategies for dealing with adversity, such as problem-solving, good
communication and social skills, optimistic thinking, and help-seeking.

A key element in developing resilience is to enhance the mental health literacy of


individuals. Here, mental health literacy is defined as: ‘the ability to recognise specific
disorders; knowing how to seek mental health information; knowledge of risk factors and causes,

52
Op cit.
53
Rutter, M. (1987). Psychosocial resilience and protective mechanisms. American Journal of
Orthopsychiatry, vol. 57, pp. 316-331.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
601

of self-treatments and of professional help available; and attitudes that promote recognition and
appropriate help-seeking’ (Jorm et al, 1997)54. Not only is this of use in a preventative sense it
has a key role to play in the early intervention with mental health problems. Robinson (1994)55
notes the role of education as a key strategy in assisting people to recognise stress and trauma in
themselves as well as others.

While the effectiveness of many mental health promotion and prevention strategies has
not been comprehensively demonstrated, interventions that improve mental health literacy,
coping skills, and social support appear to be helpful (Graham et al 2000)56. The evaluation of
mental health promotion programs must continue a key element of which is the development of
appropriate indicators of wellbeing and mental health promotion benchmarks (National Mental
Health Strategy, 2000)57. To be able to establish an evidence-based mental health promotion
program in the ADF, in particular to develop an effective mental health literacy program, the
ADF needs to develop indicators of mental health literacy in the ADF community. Many of
these measures are implemented through broad surveys that access a range of members of the
community, the ADF has such a vehicle in the Defence Attitude Survey.

AIM

The aim of this paper is to report the initial development and administration results of a
set of measures of mental health literacy in the ADF through the Defence Attitude Survey.

THE DEFENCE ATTITUDE SURVEY

The Defence Attitude Survey was first administered in 1999. It replaced the
existing single Service attitude surveys, drawing on content from each the
RAN Employee Attitude Survey (RANEAS), the RAAF General Attitude
Survey (RGAS), the Soldier Attitude and Opinion Survey (SAOS) and the
Officer Attitude and Opinion Survey (OAOS). The amalgamation of these
surveys has facilitated comparison and benchmarking of attitudes across the
three Services whilst maintaining a measure of single Service attitudes.

The Directorate of Strategic Personnel Planning and Research (DSPPR) re-administered


the survey to 30% of Defence personnel in April 2001. The results were widely used throughout

54
Jorm AF, Korten AE, Jacomb PA, Christensen H, Rogers B & Pollitt P (1997), Mental health literacy: A
survey of the public’s ability to recognise mental disorders and their beliefs about the effectiveness of
treatment, Medical Journal of Australia, vol. 166, pp. 182-186.
55
Robinson R. (1994). Developing Psychological Support Programs in Emergency Service Agencies. In
Watts R & de L Horne D. (Eds). Coping with Trauma: The Victim and the Helper. Academic. Brisbane.
56
Graham A, Reser J, Scuderi C, Zubrick S, Smith M & Turley B (2000). Suicide: An
Australian Psychological Society Discussion Paper. Australian Psychologist, 35(1), pp.
1-28.
57
National Mental Health Strategy, (2000). Promotion, Prevention and Early Intervention for Mental Health.
Commonwealth Department of Health and Aged Care, Canberra.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
602

the organisation. Consequently, to maintain the provision of this information, the Your Say
Survey was developed, taking a number of key items from the Attitude Survey to be more
regularly administered to gather trend data on the organisation. The Your Say Survey is
administered to a 10% sample of Defence members twice a year, and while providing useful
information, the sample size is not extensive enough to allow detailed breakdowns of the data.

It was determined by the Defence Committee58 in May 2002 that the Defence Attitude
Survey be administered annually to a 30% sample of Defence personnel, allowing for more
comprehensive data analysis. The Committee also directed that an Attitude Survey Review
Panel (ASRP) be established, with representatives from all Defence Groups, to review and refine
the content of the Attitude Survey.

The final survey was a result of thorough consultation through the ASRP. The item
selection both maintained questions from previous surveys to gather trend data, and incorporated
new questions to feed into Balanced Scorecard and other Group requirements. Service forms are
identical, with the only variation being Service-specific terminology. The Civilian form
excludes ADF specific items and includes a number of items relevant to APS personnel only.

THE AIMS OF THE DEFENCE ATTITUDE SURVEY ARE TO:

• inform personnel policy and planning, both centrally and for the single Services/APS;
• provide Defence Groups with a picture of organisational climate, and;
• provide ongoing measurement in relation to the Defence Matters scorecard.

Questionnaire

The survey consisted of four parallel questionnaires, one for each Service and one for
Civilians. The Civilian form excludes ADF specific items and includes a number of items
relevant to APS personnel only. Terminology in each form was Service-specific.
Each survey contained a range of personal details/demographic items including gender, age,
rank, information on deployments, specialisation, branch, Group, years of Service, education
level, postings/promotion, and family status (44 for Navy, 40 for Army and Air Force, 35 for
Civilians). Navy personnel received additional questions regarding sea service. The survey
forms contained 133 attitudinal items (some broken into parts) for Service personnel and 122 for
Civilians. As in previous iterations, respondents were given the opportunity to provide written
comments at the end of the survey.

As directed by the Defence Committee, a number of changes were carried out on the
survey items, through discussion of the Attitude Survey Review Panel. This refinement process
attempted to balance the maintenance of sufficient items for gathering trend data and reducing

58
The Defence Committee is the senior decision-making committee in the ADF; its membership includes the
Chief of the Defence Force and all three Service Chiefs.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
603

the number of items to decrease the length of the survey. While a number of items were
excluded due to no longer being relevant or appearing to duplicate other questions, further items
were added to address issues previously excluded. The new additions included items on
Wellbeing, Internal Communication, Security, Occupational Health and Safety and Equity and
Diversity (which had been included in the 1999 iteration of the survey). Further demographic
items were also added regarding work hours (predicability of and requirement to be on-call) as
well as awareness of Organisational Renewal and the Defence Strategy Map. The additions
resulted in more items being included in the 2002 survey than the 2001 version, however total
numbers were still lower than the original 1999 questionnaire.

Attitudinal items were provided response options on a five-point scale where


one equalled ‘Strongly Disagree’ and five equalled ‘Strongly Agree’ (a
number of the items were rated on satisfaction or importance scales, rather
than the more common agreement scale).

Mental Health Literacy Items

A pool of potential mental health literacy items was identified from a variety of sources
in the literature. These were then workshopped among subject matter experts and the final form
of the items was agreed by the ASRP; the final set of items were:

• How would you rate your knowledge of mental health issues?


• Do you think mental health is an issue Defence should address?
• How would you rate your own mental health?’
• If you thought or felt you were mentally unwell, where would you seek help?
• Alcohol abuse is a problem within Defence.
• Drug abuse (including steroids) is a problem within Defence.
• My social support network is satisfactory should I need to ask for help or talk about personal
problems.

Sampling

The sample for the Defence Attitude Survey is typically stratified by rank, however,
concerns had been raised by Group Heads that Groups were not being representatively. Thus,
for the 2002 sample, the thirty percent representation of the Organisation was stratified by both
rank and Group. Recruits and Officer-Cadets were not included in the sample, as per the 2001
administration. Upon request, the whole of the Inspector General’s Department was surveyed to
provide sufficient numbers for reporting on this small Group.

Administration

The survey was administered as a ‘paper and pencil’ scannable form and employed a
‘mail-out, mail-back’ methodology. For a selection of personnel in the Canberra region, where

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
604

correct e-mail addresses could be identified, the survey was sent out electronically. This
methodology allowed the survey to be completed and submitted on-line or printed out and
mailed back in the ‘paper and pencil’ format.

Due to declining response rates for surveys and inaccuracies encountered in address
information, additional attempts were made to ensure that survey respondents received their
surveys and were encouraged to complete them. Surveys were grouped into batches to be
delivered to individual units. In coordination with representatives from each of the Service
personnel areas, units were identified and surveys were sent to unit CO/OC s for distribution to
sampled personnel, accompanied by a covering letter from Service Chiefs. A number of issues
were encountered in this process, including the fact that some CO s are responsible for vast
numbers of personnel (for example, HMAS Cerberus), and this process entailed double-handling
of the survey forms.

Surveys were sent out directly from DSPPR in Canberra, with Civilian forms delivered
via regional shopfronts as specified by pay locations. Completed questionnaires were returned
via pre-addressed return envelopes directly to DSPPR.

Table 1 below outlines the response rate59 by Service/APS. The response rate from 2001
is also included, and the decline indicates that delivery via unit CO/OC s was not an improved
methodology, and also highlights the operational commitments of personnel, particularly those in
Navy.

Table 1
Air
Navy Army APS Total
Force
Sent 4640 6841 3461 5625 20567
Return to Sender 489 265 179 312 1245
Useable Returns 1532 2669 1808 3504 9513
Response Rate 36.9% 40.6% 55.1% 66.0% 49.2%
2001 Response 52.0% 50.7% 60.9% 56.2% 54.5%
Rate
2001-2002 -15.1% -10.1% -5.8% +9.8% -5.3%
Difference

Demographics

A randomly selected sample of 1,525 cases were taken from this data set to provide the
analysis for this paper. Because the focus of this paper is on the delivery of service to ADF
(i.e., uniformed members) the civilian component of the sample was removed for clarity,
leaving a sample of 990 reposndents. Demographic data for the sample are:

59
The response rate is calculated by the number of useable returns divided by the number of surveys mailed out minus the number of
surveys returned to sender.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
605

• Gender – 87.6% males, 12.4% females.


• Age – mean 33 years, median 32 years.
• Service – RAN 24.6%, Army 45.3%, RAAF 30.1%
• Length of Service – mean 11.96 years, median 12 years.
• Time in Current Posting – 76.6% two years or less.
• Proportion having served on operations – 48.3%
• Proportion in a recognised relationship – 64.3%
• Proportion with at least Year 12 education – 71.1%

RESULTS

Examination of the initial basic frequencies showed the following observations:

• Less than 45% of the ADO believe that they have an understanding of mental health issues
that is better than fair.
• More than ninety percent of the sample agree that mental health is an issue that the ADF
should address.
• Over one third of the ADO rate their mental health as Fair or worse. One in ten rate their
mental health as poor or very poor.
• One in ten respondents said that they would not seek help if they felt that they were mentally
unwell. One in six said that they would seek support from outside the ADF if they felt they
were mentally unwell.
• Nearly half of the ADF were either uncertain, or disagreed with the statement that alcohol
abuse was a problem within Defence.
• Two thirds of the ADF were either uncertain or disagreed that drug abuse (including steroids)
was a problem for Defence.
• In terms of general protective factors, nearly 40% were uncertain or felt that their social
support systems were inadequate should they need to talk to someone or seek support.

HOW ATTITUDES VARY BY RANK

In order to get a broader understanding of the levels of understanding of mental health


issues within the ADF, responses to the mental health items were compared across rank levels
(Other Ranks, Non-Commissioned Officers, Officers). This showed that only five items showed
significant differences across rank and an analysis of the residuals showed some identifiable
trends:

• More NCO felt that their knowledge of mental health was poor, where more officers than
expected felt that their knowledge of mental health was good.
• More officers than expected felt that their mental health was good, while less NCO than
expected felt that their mental health was good.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
606

• There were few clear results from the analysis of residuals on attitudes towards alcohol as a
problem, however, junior members tended to feel more strongly about it being a problem for
defence than more senior ranks.
• Attitudes on other drugs (including steroids) varied across rank with other ranks being more
extreme with more than expected responding in both of the extreme categories. NCO
generally felt that drugs were a problem, while officers were less inclined to describe drugs
as a problem.
• Attitudes towards the adequacy of social support networks also varied across ranks but with
no clearly discernable pattern to the residuals, with the exception that more officers were
positive about their ability to access these networks.

How Attitudes vary by Service

It is widely recognised in the ADF that each of the individual services possesses a distinct
and unique culture that significantly influences the attitudes and values of those service
members’. Given this, is it reasonable that attitudes to mental health will vary across the services
in the ADF. Comparison of the items across service showed the following differences:

• There were no differences in perceived levels of understanding of mental health problems,


attitudes towards whether mental health was an issue that the ADF should address, or levels
of mental health.
• Alcohol was perceived as much more of an issue by the Navy, whereas the Air Force felt that
it has less of an issue that required addressing.
• There were clear differences in attitudes about drug use as a problem, with the Air Force less
likely to see drugs as a problem than either Army or Navy.
• There were no differences in perceptions of adequacy of social support networks.

DISCUSSION

The results presented above only provide basic descriptive and some very basic
inferential statistics from the DAS data set. However, they provide good guidance for the
development of a mental health literacy program, and this has been identified as a key
component of the ADF Mental Health Strategy.

The group statistics indicate clearly that ADF members feel that mental health is an
important issue, and a small, but significant, proportion feel that they have poor mental health.
ADF members feel that they have an inadequate understanding of mental health issues, and a
small but also significant proportion claim that they would not seek support if they felt that they
were mentally unwell. This is a significant concern for commanders, particularly the
implications that it has for the health of the deployable force.

On specific issues, the majority of members do not feel that the either alcohol or drugs
(including steroids) is a problem for the ADF. Given the extant data on alcohol use rates in the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
607

ADF60, and community drug use rates61 this is of concern for the provision of drug and alcohol
education to the ADF. And a clear indicator of a need for an enhanced

Comparison across ranks indicates that there are clear differences in attitudes depending
on the rank of the member. In particular, there would appear to be a need to improve education
and mental health services targeting NCOs. Attitudes towards alcohol and drugs vary across
ranks and support anecdotal evidence about the differences in these groups that reflect age (or
generational) effects across these groups. The implication is that there is a clear need to vary the
mental health literacy programs provided to these groups, perhaps with specific effort targeted
towards NCO.

Given the widely held view of significant cultural differences between the three services
the comparison across services is surprising in that there are no differences in any of the items
with the exception of attitudes towards alcohol and other drugs. The differences in attitudes
towards alcohol might be explained by the fact that the Navy has had a alcohol management
program for some time, whereas neither the Army or Air Force have. The differences in
attitudes towards illicit drugs is more difficult to explain and requires more investigation,
particularly given the group data showing a relative lack of concern for illicit drugs as a problem
for the ADF. These data would support the notion of a tri-service mental health literacy program
with enhanced education for Army and Air Force members.

The data reported here only addresses a small portion of the DAS, which has 330
variables measuring a very wide range of organisational behaviour markers. But even this initial
analysis indicates that this tool provides an excellent opportunity to measure mental health
literacy on a continuous basis, providing an excellent means of evaluating the mental health
literacy programs that will be guided by the results of this instrument. Future analyses need to
more fully explore the descriptive information in the data set as well as exploring some of the
inferences that might be made from this data.

CONCLUSION

The provision of mental health services is a key part of the management of the health of
the ADF. The ADF Mental Health Strategy has identified a need to develop indicators of mental
health literacy as a key component in building health promotion and prevention programs in
order to build resilience in ADF members.

The DAS provides the ADF with an excellent tool for developing such indicators because
it is comprehensive and enduring in nature. It allows the monitoring of mental health literacy
over time and the tailoring of mental health literacy programs to meet the evolving mental health
literacy of the ADF.
60
Current data suggests that around 30% of the ADF drink at hazardous levels.
61
There is no ADF data on drug use due to lack of reporting but Australian population data indicate that more
than half of the Australian school leavers have tried cannabis, and 40.4% of the 20-29 year old men in
Australia have recently used an illicit substance.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
608

DEFENCE ETHICS SURVEY: THE IMPACT OF SITUATIONAL MORAL


INTENSITY ON ETHICAL DECISION MAKING

Sanela Dursun, MA and Major Rob Morrow, MA


Director Human Resources Research and Evaluation
Ottawa, Canada
Dursun.S@forces.gc.ca & Morrow.RO@forces.gc.ca

BACKGROUND

In 1998, The Directorate for Human Resources Research and Evaluation (DHRRE) was
approached by the Defence Ethics Program (DEP) to conduct a comprehensive assessment of the
ethical climate of the Canadian Forces and Department of National Defence (CF/DND) and the
values used by members to make ethical decisions. A model (Figure 1) of ethical decision-
making applicable to the Department of National Defence was developed (Catano, Kelloway &
Adams-Roy, 1999).
Figure 1

A Model of Ethical Decision Making Behaviour

PREDICTORS ETHICAL DECISION

An instrument, based upon the model, was constructed to describe the ethical climate in
the organization, to derive an understanding of the ethical values of respondents, and, to gauge
individual levels of moral reasoning and systematic approaches to ethical decision-making. The
results indicated that the components of the model were successful in accounting for ethical
decision making except for situational moral development and situational moral intensity
(Catano et al, 1999).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
609

The survey was re-administered in the summer of 2003. A review of the original ethics
instrument was conducted to further refine the DND/CF ethical decision-making model (Dursun
and Morrow, 2003). A new approach to measuring moral intensity represented the most
significant change to the model and the measurement instrument. This paper will illustrate how
moral intensity was measured and will present preliminary results of the moral intensity
component of the 2003 survey re-administration.

MEASURING MORAL INTENSITY

Perceived moral intensity deals with the individual’s perception of the specific
characteristics of the moral/ethical issue and directly influences whether the individual believes
that the issue contains a moral or ethical dilemma. If the moral intensity of a situation is
perceived to be weak, individuals will not perceive an ethical problem in the issue.

While ethical perception is concerned with the individual’s recognition of a moral issue
(Jones, 1991) and drives the entire ethical decision making process (Hunt & Vitell, 1993), ethical
intention is making a decision to act on the basis of moral judgments (Jones, 1991). The moral
intensity dimensions should influence all stages of the ethical decision making process, from
recognition that an issue represents an ethical dilemma to deciding whether to engage in a
particular action.

Moral Intensity

Jones (1991) describes six dimensions of moral intensity: magnitude of consequences


(MC), social consensus (SC), probability of effect (PE), temporal immediacy (TI), proximity
(PX), and concentration of effect (CE).

Magnitude of consequences refers to the sum of harms (or benefits) resulting from the
moral act in question. Jones illustrates this construct as follows: an act that causes 1000 people to
suffer an injury is of greater magnitude of consequences than an act that causes 10 people to
suffer the same injury (Jones, 1991).

Social consensus refers to the degree of social agreement that a proposed act is ethical
or unethical. Individuals in a social group may share values and standard, which influence
their perception of ethical behaviour. A high degree of social consensus reduces the level
of ambiguity one faces in ethical dilemmas. An act that most people feel is wrong has greater
moral intensity than an act about which people’s opinions vary.

Probability of effect refers to both the probability that the act in question will happen,
and the probability that the act will actually cause the harm predicted. The more likely an act
will cause harm, the greater the propensity of an individual to view the act as unethical. For
example, Jones (1991) suggested that selling a gun to a known criminal (?) has a greater
probability of harm than selling a gun to a law–abiding citizen.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
610

Temporal immediacy refers to the length of time between an act and the consequences
resulting from the act. In other word, an act that will have negative consequences tomorrow is
more morally intense than an act that will have negative consequences in ten years.

Proximity refers to the feelings of nearness that the moral agent holds for the target of
moral act. There are four aspects of proximity: social, cultural, psychological and physical.
As an example Jones states that the sale of a dangerous pesticide in the U.S. has greater moral
intensity for U.S. citizens than the sale of the same pesticide in another country would have
on them.

Concentration of effect refers to the impact of a given magnitude of harm in relation to


the number of people affected. Jones provides as an example that cheating an individual or a
small group of individuals of a given sum has a more concentrated effect that cheating a large
corporation of the same sum.

Instead of assessing moral intensity through manipulating the severity of ethical scenarios
(Catano et al., 1999), this CF/DND study examined the relationship between perceived moral
intensity dimensions and three stages of the ethical decision making process. Scenarios of an
arguably ethical nature were used to stimulate participants’ perception of the moral intensity of
the vignette, and to examine participants’ ethical perception, moral intention and judgement of
the decision made in each vignette. The utilization of scenarios is considered a “positive solution
in improving the quality of data from questionnaire” (Paolillo & Vitell 2002). Singer (1998)
emphasized that when compared with other approaches to ethics research, scenarios are less
susceptible to the social desirability bias. It is very important to have a standardized stimulus for
all respondents, which will make a decision making process more real.

METHODOLOGY

Perceived moral intensity, recognition of an ethical issue, ethical intention and ethical
judgement were measured using an instrument consisting of four scenarios for civilian and five
for military personnel involving ethical situations. The military version of the questionnaire
contained one additional scenario, in order to assess the effect of moral intensity on ethical
decision-making in an operational environment. All scenarios were adopted from the compilation
of focus groups findings (a study conducted in 2001) in which CF members and DND
employees identified the ethical issues with which they were exposed. An initial selection of ten
scenarios was pilot tested to ensure the salience of the stimulus for both civilians and military. In
an effort to reduce the potential for a social desirability response bias, scenarios were written in
the third person, rather than having the participant be the decision maker (Butterfield et al.,
2000). To reduce the potential for gender bias, the gender of the actors was not specified.

Perceived moral intensity

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
611

The perceived moral intensity scale developed by Singhapakdi et al. (1996) was adapted
for the purpose of the CF/DND study. A single statement was used for each component of
perceived moral intensity. A seven-point Likert-type scale was used in the measurement. As
moral intensity is a situation-specific construct, it was measured separately for each of the five
scenarios.

This study examined the effect of all dimensions of moral intensity except concentration
of effect. Most studies have not found support for this dimension of moral intensity. Chia &
Mee (2000) suggested that this dimension should be deleted from the moral intensity construct.
Jones (1991) admitted that he included concentration of effect in the moral intensity construct
“for the sake of completeness.”

The interpretation of scores is different for one of the five remaining dimensions of moral
intensity. For magnitude of consequences, temporal immediacy, social consensus and probability
of effect, a high score indicates a high level of perceived moral intensity, while for proximity a
high score indicates low level of moral intensity.

Recognition of moral issue


Respondents started by reading each scenario and their ethical perception was measured
by asking them to respond to a single item, “Do you believe that there is a moral or ethical issue
involved in the above action/decision?” (Barnett, 2001) on a 7-point scale ranging from 1
(completely agree) to 7 (completely disagree). Lower scores indicated that participants agree that
the action/decision had a moral or ethical component.

Ethical intention
Respondents’ ethical intentions were measured by asking them to indicate the likelihood
“that you would make the same decision described in the scenario” on a 7 point Likert scale with
1 representing “Definitely would” and 7 representing “Definitively would not”.

Ethical judgement
Respondents’ judgements about the morality of the actions in each scenario were
assessed with a 7-point, eight-item semantic-differential measure developed by Reidenbach and
Robin (1998, 1990). The ethical judgment scale has been used in several empirical studies and
has demonstrated acceptable psychometric properties, with reliability coefficients in the .70 to
.90 range (Barnett et al., 1998, Robin et al., 1996).

To assess the effectiveness of the moral intensity constructs on ethical decision-making,


regression analyses were conducted. Specifically, within each scenario, assessments of each of
the five components of moral intensity were used to predict ethical decision-making. A similar
set of regression analyses was conducted to assess the impact of moral intensity on moral intent.
Within each scenario, assessments of the moral intensity were used to predict moral intent.
Finally, moral intensity components were used to predict moral awareness, or the recognition of
a moral issue in the scenarios.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
612

RESULTS

The results seem to demonstrate strong support for three of the five components as
predictors of ethical decision making. In all five scenarios, social consensus, magnitude of
consequences and probability of effect significantly predicted ethical decision-making. There
was only partial support for temporal immediacy in scenario one and three and support for
Proximity in scenario one.

The results for predicting moral intent were similar to the ethical decision making results.
Strong support seems to be evident for the same three components of moral intensity (social
consensus, magnitude of consequences and probability of effect) as predictors of moral intent.
Weaker support was shown for temporal immediacy in scenario one and three and for proximity
in scenario one and four.

While some components of moral intensity predicted moral awareness, none of them
were consistent predictors and the overall regressions accounted for small amounts of variance
(< 5% except in scenario 3).

DISCUSSION

These results mirror the results that have been produced by other authors. Similar to
Singer (1996, 1998) and colleagues (Singer et al., 1998; Singer & Singer, 1997) who found
support for social consensus and magnitude of consequences, this study also found that these
components were strongly associated with ethical decision making. The only difference is that
this study also found consistent support for probability of effect that was not found in previous
studies. When predicting moral intent, these results mirrored those of Barnett (2001), Butterfield
et al (2000), Chia & Mee (2000), and Frey (2000) who also found that magnitude of
consequences, social consensus and probability of effect were relatively strong.

Quite clearly, results for recognizing a moral issue did not identify strong predictors.
However, a closer look at the scenarios reveals that they were all rather complex in the sense that
there were potential dilemmas in each of them. Without any scenarios in which there was no
ethical dilemma or ones where there was a clear dilemma, it was not surprising that there was
very little variability in participants’ assessments. This restriction of range likely contributed to
the results.

IMPLICATIONS

Additional positive benefits of these results over and above their substantiation of other
researchers findings are the implications they have for policy makers in DND. Social consensus,
magnitude of consequences and probability of effect were all strong predictors of ethical
decision making and moral intent. These components are also factors over which the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
613

organization has significant control. Social consensus refers to the degree to which people agree
that a particular act is ethical or not. Magnitude of consequences refers to total harm resulting
from the moral act in questions. Policy formulations which clearly outline unacceptable
behaviour and the consequences of those behaviours help to develop the consensus among CF
personnel and DND employees about what is ethical and what is unethical. That same emphasis
can shape the extent to which the magnitude or the seriousness of unethical behaviour is viewed
by personnel. In other words, the more that people agree that an act or a behaviour is unethical,
the more likely it will be generally viewed as unethical. At the same time, the more likely that
people perceive that severe harm will result from the act in question, the more likely people will
view that it is an unethical act. The results of the research demonstrate that addressing consensus
and magnitude of consequences issues should assist in assisting people to understand more
clearly what constitutes unethical behaviour.

References

Barnett, T. (2001). Dimensions of moral intensity and ethical decision making: An


empirical study. Journal of Applied Social Psychology, 31,1038-1057.

Butterfield, K.D., Trevino, L.K., & Weaver, G.R. (2000). Moral awareness in business
organizations: Influences of issue-related and social context factors. Human Relations,
53, 981-1018.

Catano, V.M., Kelloway, E.K. & Adams-Roy, J.E. (1999). Measuring Ethical Values in the
Department of National Defence: Results of the 1999 Research, Director Human
Resources Research and Evaluation, Sponsor Research Report 00-1.

Chia, A., & Mee, L.S. (2000). The effects of issue characteristics on the recognition of
moral issues. Journal of Business Ethics, 27, 255-269.

Dursun, S., & Morrow, R.O. (2003) Ethical Decision Making in the Canadian Forces:
Revision of the Defence Ethics Questionnaire. Paper presented at the International
Conference on Social Sciences, Honolulu, Hawaii, USA, 12th – 15th June 2003.

Frey, B.F. (2000a). The impact of moral intensity on decision making in a business
context. Journal of Business Ethics, 26, 181-195.

Frey, B.F. (2000b). Investigating moral intensity with the world-wide web: A look at
participant reactions and a comparison of methods. Behavior Research Methods,
Instruments, & Computers, 32, 423-431.

Hunt, S.D., & Vitell, S. (1986). A general theory of marketing ethics. Journal of
Macromarketing, 6, 5-16.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
614

Jones, T.M. (1991). Ethical decision making by individuals in organizations: An issue-


contingent model. Academy of Management Review, 16, 366-395.

Kelloway, E.K., Barling,J., Harvey, S., & Adams-Roy, J.E. (1999). Ethical Decision-
Making in DND: The Development of a Measuring Instrument. Sponsor Research Report
99-14. Ottawa: Canadian Forces Director Human Resources Research and Evaluation

Paolillo, J.G.P., & Vitell, S.J. (2002). An empirical investigation of the influence of
selected personal, organizational and moral intensity factors of ethical decision making.
Journal of Business Ethics, 35, 65-74.

Reidenbach, R.E., & Robin, D.P. (1988). Some initial steps toward improving the
measurement of ethical evaluations of marketing activities. Journal of Business Ethics, 7,
871-879.

Reidenbach, R.E., & Robin, D.P. (1990). Toward the development of a multidimensional
scale for improving evaluations of business ethics. Journal of Business Ethics, 9,
639-653.

Rest, J.R. (1986). Moral development: Advances in research and theory. New York:
Praeger.

Singer, M.S. (1996). The role of moral intensity and fairness perception in judgments of
ethicality: A comparison of managerial professionals and the general public.
Journal of Business Ethics, 15, 469-474.

Singer, M.S. (1998). The role of subjective concerns and characteristics of the moral
issue in moral considerations. British Journal of Psychology, 89, 663-679.

Singer, M., Mitchell, S., & Turner, J. (1998). Consideration of moral intensity in
ethicality judgements: Its relationship with whistle-blowing and need-for-cognition.
Journal of Business Ethics, 17, 527-541.

Singer, M.S. & Singer, A.E. (1997). Observer judgements about moral agents' ethical
decisions: The role of scope of justice and moral intensity. Journal of Business Ethics,
16, 473-484.

Singhapakdi, A., Vitell, S.J. & Kraft, K.L. (1996). Moral intensity and ethical decision-
making of marketing professionals. Journal of Business Research, 36, 245-255.

Singhapakdi, A., Vitell, S.J. & Franke G.R. (1999). Antecedents, consequences,
and mediating effects of perceived moral intensity and personal moral philosophies.
Journal of the Academy of Marketing Science, 27, 19-36.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
615

ADAPTING OCCUPATIONAL ANALYSIS METHODOLOGIES TO


ACHIEVE OPTIMAL OCCUPATIONAL STRUCTURES

Brian R. Thompson
MOSART Project, Chief of Analysis
Ottawa, Ontario, Canada
Thompson.BR@forces.gc.ca

INTRODUCTION

In the Canadian Forces (CF) job requirements are fundamentally obtained through an
occupational analysis (OA) where the structure and content of one or a series of CF occupations
are evaluated and military specifications are drafted. The CF, through the Military Occupational
Structure Analysis, Redesign and Tailoring (MOSART) project, is currently engaged in a
strategic initiative to reorganize its occupational structure - the way in which CF members are
grouped and managed from recruitment to training to career progression to release/separation.
To do so, the normal CF OA process has been modified in order to apply to a broader career field
concept, based largely on relating separate, or “stovepiped” occupations within functional areas
of employment62. This paper briefly describes how CF OA methodology was adapted to analyze
job requirements and shares lessons learned through the conduct of different analysis projects.

BACKGROUND

In 1968, several hundred occupations in the Royal Canadian Navy, Army and Air Force
were unified into one common Military Occupational Structure (MOS) called the CF. Since that
time, technology, downsizing, operational effectiveness, and the need to attract and retain
qualified military personnel has led to the need to revise the MOS. In a great many cases, CF
members have had to change their primary skill-sets to keep abreast of changing job
requirements. In the CF, job performance requirements are normally obtained through an OA
where the structure of one or a series of military occupations are evaluated and occupational
specifications are in turn drafted. These specifications provide the basis upon which training is
developed and career progression is defined. At present, under the MOSART Project, the CF is
involved in a review of the number and type of occupations that currently exist, with the ultimate
aim of increasing operational effectiveness. In this respect, MOSART is also making a
concerted effort to include the roles of the Reserve Force in any MOS modernization plans. For
purposes of this paper, Information Management (IM) and Human Resources (HR) career field
analysis projects will be discussed. Both projects defined a need to be structured in a fashion
that will effectively enable succession planning and the development of future military leaders in

62
Career Fields are formally described as a grouping of Military Occupations and/or generic jobs, which are used
for the purpose of both enhancing operational effectiveness and broadening individual career development to meet
the Environmental and CF requirements. Institutional Career Fields (not yet decided on as desirable entities) refer to
potential Career Fields serving mainly corporate, or HQ-level required functional areas, such as strategic HRM.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
616

their particular domain of work. In order to effectively develop these structures, both of these
projects required new approaches to OA.

OCCUPATIONAL STRUCTURE DEVELOPMENT

It is our unique ability to discriminate between jobs using statistics and scientific process
instead of intuition and personal experience that defines the work of the occupational analyst. In
the CF OA process, once job typing has been performed, the focus usually shifts to occupational
structure. Based on similarity of job performance requirements, the analyst considers whether
the jobs in question are best grouped into a single occupation, two or more occupations,
performed by specialty-trained members of another occupation(s), or structured in some other
way. It is important to note that the analyst’s mandate is to determine what work is done, not to
question the necessity of the work. In addition to describing work, career/employment patterns
for the occupation(s) or function(s) and their related training requirements are proposed. Three
types of occupational structure are usually developed for the occupation(s) under study. First,
Mobilization structure is developed based on results of the job typing process and provides a
framework of occupations with a narrow scope of work to facilitate rapid recruitment/manning in
times of emergency with minimum training. Then, a Primary Reserve occupational structure is
developed to augment the Regular Force, which is constructed last. These three occupational
structures, taken together, constitute the MOS that provides the basis by which members are
recruited, selected, trained, paid, etc. Beyond meeting Canadian needs in peace and war, an
integrated MOS must harmonize Regular Force personnel working in close cooperation with
their counterparts in the Primary Reserve.

MOS PRINCIPLES

The design of the MOS is largely dependent upon occupation analysis and is guided by
the application of the principles of operational effectiveness, economy of training/professional
development, career field management, and rationalized assignment. Since maximizing one of
these principles may diminish another, several criteria guide the development and evaluation of
occupational structure options. For example, in a career employment model, key developmental
jobs are represented at several periods and may require occupational training that all occupation
members receive or specialty training for those posted to specific jobs. While this career model
specifies at what points training is to be provided, and for which job or set of jobs the trainee is
to be prepared, it does not prescribe the methods of training delivery. Nonetheless, proposals
may create whole new course(s) and/or delete existing ones, or may only fine-tune existing
courses by adding, deleting and/or shifting specific content areas based on the job requirement
identified during the job typing activity.

Since the goal of the MOS is to ensure and enhance operational effectiveness, projected
personnel shortages must be addressed prior to implementing a new MOS. This highlights the
need to create occupational structures that enable the necessary rotation of personnel between
“ship/shore” and/or “field/garrison” positions. Arguably, recruiting, retention and training are
the key HR activities having a bearing on operational effectiveness. If current rates of retention

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
617

are not improved for the long term, the future of the CF is in jeopardy. It is posited that Career
Fields will permit more efficient management of military personnel within a career progression
framework. The MOS can play an important role by designing occupations and career fields that
offer excellent training and career opportunities. Attractive employment, fair and competitive
pay, employee benefits and clearly defined opportunity for career advancement are key
objectives for many MOS renewal activities.

DATA ANALYSIS

A CF OA study typically uses an occupational survey to gather empirical data. Some of


the unique characteristics of the CF such as two official languages, equal opportunity for women
in all occupations/roles, and a unified military, create certain challenges when stratified random
sampling techniques are considered. Gatewood and Field’s (1998) finding that questionnaires
are resource efficient for large samples provides support for the MOSART project use of surveys
tailored to functional domains of work that cut across several occupations. From a purely
theoretical perspective, the CF Job Analysis model recognizes E.J. McCormick’s (1979) “World
of Work” reference in job/occupational analysis. Similarly, OA organizes work in hierarchical
descending order of career field, occupation, sub-occupation, jobs, duties, and finally, tasks,
skills, knowledge and other abilities. Although occupational structuring has often come under
attack in the CF (McCutcheon, 1997), the empirical rationale of task-based requirements is still
considered the best model to accurately describe work.

In the course of a normal OA study, case, task, and knowledge focused job data are
analyzed via Comprehensive Occupational Data Analysis Programs (CODAP). These programs,
also used by the United States Air Force and Australian Defence Organization (Mitchell and
Driskall, 1996), cluster personnel based upon the similarity of time spent on similar tasks. One
challenge the analyst faces with multiple occupation analyses is that the length of the survey and
the CODAP inventory limits (maximum of 7000 cases/individuals which can be clustered and
maximum of 3000 tasks) dictates the level of discrimination used when formulating task and
knowledge statements. Furthermore, in a technical modification of the standard process, when
clustering knowledge data via CODAP, we must use the level of knowledge required by the job
incumbents vice percent time spent. Nonetheless, as with case and task clustering, the
knowledge sequence is graphically presented in a hierarchical “tree-like” diagram. In order to
fully utilize knowledge clustering, a requirement exists to develop the knowledge inventories to
the same specificity level as afforded the task inventory. This proves to be problematic in multi-
occupational surveys given the already lengthy period of time required by the sample population
to answer the survey instruments. Despite this, knowledge clustering is a tool that the analyst
will rely upon when structuring career fields that not longer fit within existing occupational
boundaries.

Information Management Functional Analysis (IMFA). Events such as the September 11th
terrorist attacks and CF involvement in Afghanistan and Iraq have underscored the essential need
for a highly organized, interoperable system of managing information. As a result, the CF has
commenced transition to an Enterprise Model for IM including the professional development and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
618

career planning of individual CF members and civilian Department of National Defence (DND)
employees. The aim of the IMFA is to describe the IM work done by 10 military occupations
(officer and non-commissioned) and two civilian classifications as well as to identify the way
ahead for the training, development and career paths required to produce and maintain the DND
IM workforce. The survey will be administered to approximately 7000 CF members/civilians
who occupy IM positions in the DND. While this survey will likely be presented in a paper and
pencil format, efforts are being made to develop a web-based survey system since the task count
for a survey of this magnitude is considerably greater that in a normal paper and pencil survey.
Concepts of inter-operability and the ability to effectively function in joint operations are key to
the development of an effective IM occupational structure. One challenge with implementing an
IM career field will lie in defining the functional authority for IM and establishing what are the
roles and relationships with each particular environment.

Human Resource Functional Analysis (HR FA). Although a unified CF MOS has been
maintained, there has been philosophical differences in the approach to HR management
between the Sea, Land and Air environments. Despite this, there is general agreement that an
overall system of control and HR management is required in the CF. A CODAP data analysis
technique was used to interpret the HR data by grouping tasks into modules without restrictions
normally associated with non-overlapping duty areas (Thew and Weissmuller, 1978). This
technique was used to better understand HR jobs and their inter-relationships existing within this
domain of work. Once HR jobs and their associated tasks were defined, task co-performance
modules were examined to determine the overlap of competencies across the survey population.
Finally, those positions that can be filled by persons from one or more career field(s) and/or
stand-alone occupations are assigned for succession planning purposes.

CONCLUSIONS

While our current phase of the project is not complete, our experience suggests that these
types of analyses are feasible for multiple occupational studies. The normal CF OA
methodology has been adapted to deal with career fields by expanding surveys to include several
occupations (some including both military and civilian respondents). In addition, the concept of
knowledge clustering has been incorporated into normal OA practice. As survey methodology
only defines present-day work requirements, it is incumbent that sponsors and subject-matter-
experts (SMEs) are engaged throughout the OA process. In fact, it can’t be overstressed that a
thorough process of consultation with stakeholders must occur prior to effecting any MOS
change. However, respondents have generally tended to be positive about these large-scale OA
surveys. Many have commented that the survey effectively captured their collective
competencies and should assist in building a MOS that will provide professional and technically
competent personnel to perform all future CF roles and missions. As the CF enters a phase
where it must effectively structure its’ workforce within a work/job-oriented focus, it is hoped
that CF OA methodology will be further adapted to embrace personal competencies.
REFERENCES

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
619

Gateworth, R.D., & Field,H.S. (1998). Human resource selection. (4th Ed.) Fort Worth: Harcourt
Brace.

McCormick, E.J. (1979). Job Analysis: Methods and Applications. New York: Amacom.

McCutcheon, J.M. (1997). “Competencies” and “Task Inventory” Occupational Analysis – Can
they both sing from the same music sheet? 10th International Occupational Analyst Workshop,
San Antonio, Tx.

Mitchell, J.L., & Driskall, W.E. (1996). Military Job Analysis: A historical perspective.
Military Psychology, 8(3), 119-142.

Thew, M.C. & Weissmuller, J.J. (1978). CODAP: A new modular approach to occupational
analysis. Proceedings of the 20th annual conference of the Military Testing Association
(pp.362-372), Oklahoma City, OK.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
620

Whom Among Us? Preliminary Research on Position and


Personnel Selection Criteria for MALE UAV Sensor
Operators
Captain Glen A. Smith
Canadian Forces Experimentation Centre
Ottawa, Ontario, Canada K1A 0K2
gsmith42@uwo.ca

Abstract

Net-centric warfare and interoperability are fast becoming basic tenets of modern military
strategic thought. The Canadian Forces and its NATO allies are currently conducting
research into the effective use of current and emerging technologies such as airborne
sensors and uninhabited aerospace vehicles (UAVs) to enhance their intelligence,
surveillance, and reconnaissance (ISR) capabilities. Effective sensor operation is critical
to the successful support of UAVs to Canada’s joint and combined net-centric warfare
capability. The selection, training, and employment of Canadian Forces personnel as
sensor operators will depend upon an accurate analysis of this position’s requirements
and upon the determination of whom among us has the appropriate training and
experience to competently fill this vital ISR position. Canadian Forces UAV
experimentation is developing an understanding of the generic task and knowledge
requirements of the Medium Long Endurance (MALE) UAV Sensor Operator position to
that end. This paper discusses the methods and techniques used over the course of three
major research events to determine the position and personnel selection criteria for
MALE UAV Sensor Operators and provide preliminary results from Canadian Forces
research to date.

Introduction
At the turn of the millennium, there is an apparent fundamental shift pervading
current military strategic thought. Ongoing research into the effective use and practical
application of secure information technology and information management techniques to
improve C4ISR capabilities between tactical, operational, and strategic units to exploit
opportunities and increase mission success are leading to the common development of
net-centric warfare principles and procedures. In a related area, HR reviews have been
conducted among the components of the United States Department of Defense, within the
Australian Defence Force, as well as the Canadian Forces’ Military Occupational
Structure Analysis, Redesign, and Tailoring (MOSART) Project in order to assess its
capabilities to meet the expected human resources demands to the year 2020. Assessing
both the capital and human resource assets of the Canadian Forces with a knowledge of
the common strategic thought of incorporating net-centric warfare serves to focus both
national objectives and international commitments and to synergize the interoperability
between allied nations.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
621

Inevitably, new technology will be introduced into the Canadian Forces over the
coming years. Medium Altitude, Long Endurance (MALE) UAVs have shown
promising potential for use in Canadian Forces initiatives to further enhance its ISR
capability to the year 2020. The ability of this technology to minimize the potential
threat and loss of aircrew in domestic activities such as coastal and fishery patrols as well
as international commitments such as peacekeeping operations is enticing. Further, this
technology’s ability to remain aloft for up to 40 hours and provide detailed ISR imagery
at a maximum ceiling of up to 30,000 feet from over 400 nautical miles from its ground
control station (GCS) are features that further make MALE UAVs a desirable ISR asset.
When these features are coupled with the fact that a host of Canada’s allies are either
contemplating or have already introduced MALE UAVs into their air inventories, it
further spurs the Department of National Defence to investigate the possibility of
incorporating this technology into the Canadian Forces, thereby aligning our human and
capital resources as well as our interoperability with other nations we are presently or
may be involved with in combined operations such as peacekeeping in the future.

Method
Collecting data for research involving emerging technology such as MALE UAVs
provided a unique and interesting opportunity. Occupational data collection and analysis
is routinely reported on through the process of occupational analysis within the Canadian
Forces, as it has been for the past 20-25 years in the Canadian Forces. An extensive
history of analyzing occupations within the Canadian Forces through a method which
requires all members of an occupation under study to complete an extensive inventory of
task, knowledge, and skills performed or required within their occupation, provides the
Canadian Forces with a good understanding of its workforce’s capabilities. However,
little was known until recently about the job requirements associated with the various
positions necessary to provide real-time intelligence, surveillance, and reconnaissance
(ISR) information via MALE UAVs. Even though knowledge gained from research
conducted by allied nations in this area is available, it is difficult to determine whether
military personnel employed in MALE UAVs in other countries have skill sets, training,
and experience similar to potential candidates from Canadian Forces occupations.

Opportunities to observe Canadian Forces personnel in actual MALE UAV


positions were authorized through a series of research events demonstrate the operational
efficiency and effectiveness of UAV technology concurrent to joint military exercises
and operations. UAV participation in Exercise Robust Ram, Operation Grizzly, and the
Pacific Littoral Information, Surveillance, and Reconnaissance Experiment (PLIX)
provided venues in which manufacturers, soldiers, and research scientists could converge
at a central location in order to demonstrate products, train with and operate potential,
future assets, and collect data on a variety of aspects associated with the future direction
of Canadian Forces operations and procedures. For researchers interested in the person-
job fit between the 105 occupations within the Canadian Forces and the various positions
within a MALE GCS, devising research designs and developing a research methodology
based on the lessons learned from these research events was seen as essential. Collecting
job requirements in terms of tasks, knowledge, and skills and determining the overlap
between these job attributes and those formally contained in the occupational

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
622

specifications of the 105 occupations within the Canadian Forces was viewed as the most
pragmatic and objective way of eventual recommending occupations for selection,
training, and employment in these positions. Over the course of these research events,
instruments were developed to measure the tasks and knowledge involved in UAV GCS
positions as well as the human factors and physiological factors involved as well. These
instruments are described after a short summary of the three research events involving
UAVs that have occurred to date.

Robust Ram
The initial field experiment involving UAVs was held at Suffield Range, Alberta
in April 2002. One mini UAV (Pointer manufactured by AeroVironment), and two
MALE UAVs (Bombardier’s CL-327 Guardian and General Atomics’ I-Gnat) were
demonstrated by manufacturer representatives during military land exercises.

Job requirements associated with the tasks, knowledge, and skills, crew
composition, command and control, ground maintenance, and communications involved
in integrating information technology and information management for net-centric
warfare development were primary concerns from a human resource perspective during
this initial field work. Since Robust Ram was the initial research event focusing on the
potential of UAVs in the Canadian Forces, field observations and notes were gathered
based on discussions with manufacturing representatives, Canadian Forces personnel
employed with and supporting each UAV. Task, knowledge, and skill statements were
compiled from the 105 occupations within the Canadian Forces military occupational
structure, which served as a checklist and guidance during field observations of Canadian
Forces personnel interacting with equipment and performing duties associated with
MALE GCS positions during mission scenarios. Human resource information gathered
during Robust Ram provided a baseline understanding of common GCS positions and
their potential knowledge and task requirements.

Operation Grizzly (OP GRIZZLY)


A joint ISR operation between the Chief of Land Staff (CLS) and the Chief of Air
Staff (CAS) provided ground and air support to the June 2002 G-8 summit in Kananaskis,
Alberta. General Atomics’ I-Gnat was employed, with Canadian Forces personnel in
each of the GCS positions save that of the UAV Operator (Pilot) due to contract
obligations. OP GRIZZLY represented the first operational use of UAVs within Canada.
Canadian Forces ISR support to this summit was successful and commended due in part
to MALE UAV involvement.

OP GRIZZLY also provided the first opportunity to collect consistent data on the
tasks, knowledge, skills, and environmental demands associated with UAV GCS
positions filled by Canadian Forces personnel through the development of a structured
interview questionnaire. Canadian Forces personnel employed in the various UAV GCS
positions met individually with a Canadian Forces Personnel Selection Officer who
briefed them on the purpose of the 45-minute structured interview. Interviewees were
then asked a series of questions and their responses were recorded verbatim onto a laptop
computer. All responses were then compiled, analyzed, and reported on an

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
623

experimentation report. The structured interview schedule continues to be used to


collect data from Canadian Forces personnel employed in MALE UAV positions.

The Pacific Littoral Intelligence, Surveillance, and Reconnaissance Exercise (PLIX)


In the summer of 2003, a field experiment was conducted to assess the utility of a
multi-sensor, MALE UAV to support the construction of the recognized maritime picture
(RMP) within a specific littoral operations area. Construction of the RMP is a
fundamental activity of littoral ISR. Its efficiency and effectiveness is sometimes suspect
due to the potential limitations of current technology to provide an accurate, detailed
assessment of maritime activity within a specific area of interest. This experiment
predicted that if a multi-sensor, MALE UAV patrolled a designated littoral operations
area then all surface contacts would be detected, continuously tracked, and positively
identified in the recognized maritime picture of the operations area before the end of each
patrol. Using conventional methods, less than ten targets were identified and tracked
within the specified area of operation. This ability to identify and track targets was
increased to three times by employing just one MALE UAV.

Concurrent to the overarching objective of this experiment, a richer understanding


of GCS position requirements and the potential suitability of Canadian Forces’
occupational personnel were gained using previously employed field techniques (e.g.,
observations, notes, and structured interviews. A computer-based survey was also
developed and administered as a pilot project to Canadian Forces personnel employed as
MALE UAV Sensor Operators during PLIX. The survey was developed using Microsoft
2000 Access and provided participants with a simple ‘point and click’ navigation system
through four areas of interest with respect to the sensor operator position. These four
areas were tasks and knowledge statements contained within each member’s occupational
specifications that were associated with the UAV sensor operator position. Two other
menus permitted members employed as sensor operators to add and rate additional tasks
performed and knowledge required in the sensor operator position that were not
contained within their occupational specifications.

The MALE UAV sensor operators’ electronic survey asked participants to


identify and rate those tasks and knowledge statements contained in their occupational
specifications that were performed or required in the GCS position they filled. The
electronic survey was well received and provided a rich source of information about the
MALE UAV sensor operator position. Participants found this survey easy to navigate
and easy to complete. Their completion of the additional tasks and additional knowledge
sections of this survey showed a willingness to provide supplemental data on task and
knowledge statements that were not included in their occupational specifications but were
performed in the sensor operator position. This additional information, along with the
task and knowledge data from their occupational specifications, can also be used to
develop job inventories to conduct analysis on this position in future experiments
involving MALE UAVs.

Participation by personnel from three distinct Canadian Forces occupations


further expanded the task and knowledge requirements associated with the UAV sensor

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
624

operator position. Analysis of the distribution of task and knowledge associated with this
position displayed in the figures below, suggests that human resource requirements will
require training and experience in sensor selection and manipulation, radar tracking, and
competencies involving the organization and management of information. The
information contained in graphs 1 and 2 below describes the distribution of knowledge
and tasks associated with the MALE UAV sensor operator position to date. A further
comparison and description of the both task and knowledge distributions is also provided
in graph 3, followed by a summary of the information obtained from structured
interviews conducted with Canadian Forces personnel that participated in the three
MALE UAV research events to date.

Distribution of Knowledge for


MALE UAV Sensor Op
Administration
7%
Information Tactics Electronic Warfare Navigation
Systems 6% 7% 7% General Aircrew
Management 8%
6%

Electronics Radar
3% 8%

Infrared
2%

Combat
Information Communications
Operations 8%
Organization
13%
24%

Figure 1 – Distribution of Knowledge for MALE UAV Sensor Operators

Figure 1 displays the distribution of knowledge required for MALE UAV sensor
operators during PLIX. The largest proportion of knowledge involves organizing combat
information (24%) obtained through UAV sensors for use by higher commanders.
Comprehensive knowledge of air and surface radar equipment, their capabilities and
limitations along with radar controls and procedures are fundamental to this position.
Rules of engagement, constructing contact probability areas, as well as intelligence and
evidence gathering techniques, collection requirements, and team duties and
responsibilities detail the operational knowledge (13%) component needed by sensor
operators to support future MALE UAV missions. Knowledge items such as intercom
systems and operating procedures, voice procedures, infrared radar interpretation, and
radar display interpretation are examples of communications (8%) and radar (8%)
knowledge requirements. General aircrew (8%), administration (7%), and navigation
knowledge (7%) suggest that there are environmental as well as flight support

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
625

responsibilities involved in this position as well. Flight safety procedures and


regulations, air traffic control organization and procedures, as well as heading and
altitude reference and operating systems describe the sensor operator general aircrew
knowledge requirements (8%), while principles of true navigation, dead reckoning
procedures, knowledge of trigonometry, algebra, and logarithms exemplify navigational
knowledge (7%) required. Tactical knowledge (6%) was indicated as well. Here, the
data suggests that general aircrew and navigational knowledge combined with the tactical
knowledge gained through training and experience in operational military occupations is
involved in this position. Six percent of the knowledge required by MALE UAV sensor
operators involves information systems management, which includes computer hardware,
software, and network utilities and procedures, as well as data compression and
extraction techniques. Capabilities and characteristics of electronic jammers, microwave
systems, and principles of electromagnetic compatibility and interference principles are
components of electronics knowledge (3%) requirements of MALE UAV sensor
operators. Weather and obstacle avoidance procedures, understanding radar imagery
modes, and comprehending radar imagery analysis are examples of the radar knowledge
(2%) involved.

This initial understanding of the knowledge distribution for MALE UAV sensor
operators provides information for further exploration and research. Gathering
information through the use of electronic airborne means suggests that there may be a
basic requirement for military personnel selected, trained, and employed in this position
to have prerequisite operational and tactical understanding of basic aircraft or surface
radar and sensor capabilities, operations, and limitations. Operational training and
experience is considered an asset both in this particular position and in its role within the
eventual crew composition. The knowledge distribution also suggests that general
aircrew or ship borne experience may facilitate a more efficient and effective person-job
fit. The essential responsibilities of MALE UAV sensor operators will be to gather,
extract, and compile radar and sensor imagery as efficiently as possible. Knowledge
gained from working in an environment that requires potential candidates to manage
combat information and support electronic warfare objectives through information
systems management would further ensure that information from UAV missions is
competently obtained.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
626

UAV Sensor Op Task Distribution


Operations Support Administration Info Systems
Electronics 2% 2% Management
2% 4%

Radar Communications
25% 5%

General Aircrew
6%

Infrared
Combat Information
14%
Navigation Organization
Operations 9%
11% Tactics
10%
10%

Figure 2 – Distribution of Tasks for MALE UAV Sensor Operators

Figure 2 displays the distribution of tasks performed by MALE UAV sensor


operators during PLIX. Seventy percent of task performed during this research event
involved radar (25%), infrared (14%), navigation (11%), operations (10%), and tactics
(10%). Typical radar tasks consisted of identifying and classifying radar contacts,
estimating the size of radar contacts, determining radar search altitudes, and conducting
radar area searches. Conducting forward-looking infrared camera (FLIR) searches,
detecting and identifying FLIR contacts, and assessing FLIR intelligence identification
runs are examples of infrared tasks. Navigational tasks performed included determining
the UAV’s position using visual and radar fixing procedures as well as interpreting
meteorological charts, reports, and forecasts. Detecting and classifying radar contacts,
conducting intelligence/evidence gathering, interpreting, displaying, and coordinating the
display of intelligence information provide insight into tasks associated with operations.
Tactical tasks included selecting and localizing UAV sensors and assessing the tactical
significance of contacts.

Thirty percent of MALE UAV sensor operator tasks were accounted for by less
than ten percent of tasks by individual duty area. Combat information organization (9%)
tasks included interpreting radar display information, as well as coordinating and
maintaining the production of the recognized maritime picture. General aircrew tasks
accounted for six percent of tasks performed by MALE UAV sensor operators during
PLIX. Managing individual tactical displays, visually identifying and classifying
contacts, and preparing post-flight reports and messages are indicative of general aircrew
tasks performed as a MALE UAV sensor operator. Communications accounted for five
percent of tasks performed in this position during PLIX, where tasks such as maintaining
internal communications and configuring communication equipment were performed.
Smaller percentages of tasks were performed in the information systems management
(4%), administration (2%), operations support (2%), and electronics (2%) duty areas as
well.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
627
Comparison of Tasks and Knowledge Distributions by Duty Area

60

50
Absolute Value

40

30

20

10

0
A B C D E F G H I J K
Duty Area

Knowledge Tasks

Figure 3 – Tasks and Knowledge Requirements for MALE UAV Sensor Operators

A. Combat Information Organization F. Information System Management


B. Operations G. Electronics
C. Communications H. Tactics
D. General Aircrew I. Navigation
E. Administration J. Infrared
K. Radar

As may be the case with many jobs, the relationship between knowledge and tasks
is not always uniform across duty areas. Many duties require greater cognitive resources
and abilities, whereas other duties are more task-specific, requiring less knowledge,
theory and, understanding to perform or complete. The very nature of the MALE UAV
sensor operator position suggests that, although there is not a uniform distribution of
tasks and knowledge proportionally across individual duty areas, there is a sense of
uniformity within the job as a whole to support both the UAV crew and its chain of
command through the core provision of real-time information and imagery on areas of
interest. Figure 3 shows that duty areas A through G are predominantly knowledge-
specific and will require future MALE UAV sensor operators to know a constellation of
theories, principles, and procedures in order to organize combat information, support land
and sea as well as joint and combined operations, with proper internal and external
communications, for example. Duty areas H through K, on the other hand, are more task-
specific, requiring future sensor operators to support the piloting and navigation of

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
628

MALE UAVs, more often than not, through direct instructions to the UAV Operator
(Pilot) on the maneuver, circuit, and tactics required to maintain images of objects of
interest while using radar and infrared or other sensor capabilities. This relationship
between the knowledge required and tasks performed by future MALE UAV sensor
operators suggests that there may be considerable challenge involved and task,
knowledge, and interpersonal skill required on their part, in order to feed this essential
information into Canada’s net-centric warfare matrix.

Structured Interviews. Chief of Air Staff and Chief of Maritime Staff sensor operator and
support personnel were employed in the MALE UAV sensor operator position during
research events involving UAVs. Occupationally, personnel from the Airborne
Electronic Sensor Operator occupation (Chief of Air Staff), the Naval Electronic Sensor
Operator occupation and Naval Combat Information Operator occupation (Chief of
Maritime Staff) have participated in these research events. Participants averaged 19
years of military service and all but one was considered senior non-commissioned
members. One Airborne Electronic Sensor Operator participated in Robust Ram
experimentation with the I-Gnat MALE UAV. Three Airborne Electronic Sensor
Operator were employed in the MALE UAV sensor operator position during OP
GRIZZLY. Four CF members from three CF occupations were employed as UAV
Sensor operators within the Tofino UAV GCS during PLIX; two Airborne Electronic
Sensor Operator, one Naval Combat Information Operator, and one Naval Electronic
Sensor Operator. The personnel from the Airborne Electronic Sensor Operator
occupation reported that the sensor operator position required good air sense and aircrew
experience gained from at least one operational tour. They suggested that air sense,
spatial and situational awareness, and aircrew experience were necessary as well.
Personnel from the Airborne Electronic Sensor Operator and the Naval Electronic Sensor
Operator occupations also identified knowledge, skill, and experience operating Infrared
(IR) and Electro-Optical (EO) sensors as essential.

Canadian Forces’ personnel employed as sensor operators during UAV


experimentation suggested that a standard tour/posting of three to five years would be a
suitable. They suggested that the challenges involved in this position would occur from
operating new technology as a member of a new crew and acquiring the motor skills and
hand-eye coordination required in order to establish and maintain sensors on target.
Further, they reported that the GCS work environment was charged with activity,
requiring constant communication, and a proclivity for quick thinking, planning, and
preparation. Experimental MALE UAV sensor operators also found directing the UAV
Operator (Pilot) in order to maintain the sensors on a specific target was also challenging
and more so for Chief of Maritime Staff personnel employed in this capacity due perhaps
to both occupational and environmental cultural restrictions and etiquette assumed
between non-commissioned members and officers.

Environmental demands associated thus far with MALE UAV field


experimentation were reported to be inherent in air operations. Canadian Forces
personnel employed as sensor operators suggested that comprehending the present
location of the UAV in relation to its altitude, direction, distance from the GCS, and the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
629

angle of approach to the target may require a significant learning curve for non-air
operations personnel. For these reasons, training efficiency may dictate that personnel
selection criteria favor air operations personnel and occupations.

Tasks associated with the MALE UAV sensor operator position include reviewing
pre-flight documentation and attending pre-flight briefs, conducting pre-flight functional
checks on the sensor package, as well as planning and recommending the appropriate
sensor package for the mission. In-flight tasks include locating and observing targets,
selecting appropriate zoom, elevation, angle, depression, and rates of movement using
sensors to increase the accuracy and confidence in target identification. As well, this
position involves performing first level intelligence analysis on possible targets, as well
as communicating and directing the UAV Op to maintain sensor equipment on target.
Knowledge required for this position included airborne tactics, sensor technology, the
UAV’s limitations, establishing and maintaining data links, and airspace regulations.

The perceived task demands associated with this position ranged from relaxed
when conducting sensor sweeps to locate possible targets to intense concentration during
target detection as well when first level intelligence analysis was being performed. The
intense concentration required to maintain the sensor on target during target detection
was described as quite demanding, resulting in eyestrain and mental fatigue. Vigilance
and focus was also required to maintain position and continued referring of the targets in
the creation of RMP. Chief of Air Staff personnel employed in this position suggested
that these tasks were comparable to those currently performed by their occupation on
maritime patrol aircraft.

Chief of Air Staff personnel involved in the PLIX field research were also
employed as senior sensor operators, responsible for the supervision and performance of
either the Naval Combat Information Operator or Naval Electronic Sensor Operator on
their respective crews. This supervisory and management activity further enhanced the
job satisfaction they derived from this position. Other supervisory and managerial tasks
associated with this position include maintaining the RMP, ensuring efficient and
effective use of the sensor to properly provide still and video data to the chain of
command.

CF personnel involved in research concerning the MALE UAV sensor operator


position felt that their operational experience, training, and qualifications were enhanced
through their experience in this position. Chief of Air Staff personnel in particular
suggested that their tasks, knowledge, and skills associated with their primary
employment on ship borne or maritime patrol aircraft was reinforced by their
involvement in these experiments.

Job Satisfaction. Personnel employed in this MALE UAV sensor operator position
reported a high level of job satisfaction. Operating sensors on UAVs to detect and
identify targets presented a novel and efficient means of gathering ISR data. Canadian
Forces members employed in this position appreciated the opportunity to participate in
this experimentation. Canadian Forces members employed as MALE UAV sensor

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
630

operators from Naval occupations expressed some reservation as to the viability of their
occupations’ future involvement in this position on the grounds that they may be
disadvantaged compared to their cohort for advanced training, qualifications, and
promotions. These concerns were not the expressed by Chief of Air Staff personnel
employed as MALE UAV sensor operators. In fact, they suggested this employment
opportunity would be welcomed by their occupation to further enhance their sensor
training and expertise. Clearly, from an organizational as well as an occupational and
individual worker perspective, these intrinsic selection issues must also be considered in
the final decision as to whom among us is best suited to fill this ISR position.

Summary
In just a few short years, the Canadian Forces Experimentation Centre has created
opportunities for the Canadian Forces and its leaders to become familiar with the
potential and benefits of MALE UAVs as an ISR asset within the larger context of the
developing strategy surrounding net-centric warfare. In the year 2005, the Canadian
Forces is expected to make a multi-million dollar investment to incorporate UAV
technology within its inventory. Concurrent with this timeline, the Canadian Forces
Experimentation Centre has endeavored to study the human resource requirements
associated with MALE GCS positions so that sound, objective recommendations can be
made on the effective and efficient operation, support, and maintenance of this
technology. Field research events associated with UAV experimentation have provided
progressive opportunities for the development of observational and data collection
techniques to match the position and personnel requirements associated with GCS
positions. Personnel selection criteria, training development, and potential employment
patterns within the existing military occupational structure of the Canadian Forces are
becoming clearer with each research event involving this promising technology.

Our position and personnel selection criteria investigations to date are based upon
subject matter expertise. We concede that the potential for significant variation remains
to be explained through more precise measurement in laboratory settings and simulations.
Plans to conduct a formal, objective, and independent task analysis of the sensor operator
and indeed, all positions common to MALE UAV GCS are being readied for the near
future. Determining person-job fit should be viewed as an iterative process. This is
especially true when research is conducted on the design of future jobs as a consequence
of introducing emerging technology to organizations and their workforce. Although, we
cannot definitively say at this time what the exact selection criteria, training, and
employment will be with respect to this future position and job, we look forward to
further developing our understanding of these requirements through independent as well
as combined research collaborations with our allies in time for the introduction of this
technology into the Canadian Forces inventory in the year 2005.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
631

Transformational Leadership: Relations to the Five Factor Model and Team


Performance in Typical and Maximum Contexts
Beng-Chong Lim
University of Maryland and Applied Behavioral Sciences Department,
Ministry of Defense, Singapore
Robert E. Ployhart
George Mason University

Abstract
This study examines the Five Factor Model of personality, transformational leadership, and team
performance under conditions similar to typical and maximum performance contexts. Data were
collected from 39 combat teams from an Asian military sample (n = 276). Results found that
neuroticism and agreeableness were negatively related to transformational leadership ratings.
Team performance ratings correlated across the typical and maximum contexts only .18.
Furthermore, transformational leadership related more strongly to team performance in the
maximum than typical context. Finally, transformational leadership fully mediated the
relationship between leader personality and team performance in the maximum context, but only
partially mediated the relationship between leader personality and team performance in the
typical context. The Discussion focuses on how these findings, while interesting, need to be
replicated with different designs, contexts, and measures.
Transformational Leadership:
Relations to the Five Factor Model and Team Performance in Typical and Maximum Contexts
Over the last 20 years, research on transformational leadership has become one of the
dominant leadership theories in the organizational sciences (Judge & Bono, 2000). Although
there are several reasons for this, perhaps one of the most important is that transformational
leadership appears to be extremely important for modern work. For example, the growing
number of mergers and acquisitions, globalization, and uncertainty with the stock market require
leaders to not only exhibit confidence and direction, but to also instill motivation and
commitment to organizational objectives. Numerous studies have found that followers’
commitment, loyalty, satisfaction, and attachment are related to transformational leadership
(Becker & Billings, 1993; Conger & Kanungo, 1988; Fullagar, McCoy, & Shull, 1992; Niehoff,
Enz, & Grover, 1990; Pitman, 1993). Indeed, this has led researchers such as Bass (1998) to
conclude, “transformational leadership at the top of the organization is likely to be needed for
commitment to extend to the organization as a whole (p.19).”
Despite the importance of transformational leadership in practice and the wealth of
research on the topic, there are still many questions relating to the antecedents and consequences
of transformational leaders. For example, only two studies have examined the dispositional basis
of transformational leadership using the Five Factor Model (e.g., Judge & Bono, 2000; Ployhart,
Lim, & Chan, 2001), and more research is needed to understand how personality is manifested in
transformational leadership behaviors. Similarly, previous research examining the consequences
of transformational leadership has been focused almost exclusively at the individual level (i.e.,
leader effectiveness). However, many have argued that leadership may have its most important
consequences for teams and thus a focus on the team level is also important (Bass, Avolio, Jung,
& Berson, 2003; Dvir, Eden, Avolio, & Shamir, 2002; Hogan, Curphy, & Hogan, 1994; Judge,
Bono, Ilies, & Gerhardt, 2002). Further, research by Ployhart et al. (2001) suggests that

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
632

transformational leadership may be most important for maximal performance contexts rather
than for typical performance contexts.
The purpose of this study is to examine these neglected antecedents and consequences of
transformational leadership. We first examine how leader personality, based on the Five-Factor
Model (FFM), relates to subordinate ratings of the leader’s transformational behaviors. Second,
we examine how transformational leadership relates to team performance assessed under typical
and maximum performance contexts. Third, we assess whether transformational leadership fully
or partially mediates the relationship between team performance and the FFM of personality.
Thus, this study contributes to the research on transformational leadership by examining the
FFM determinants of transformational leadership, examining how transformational leadership
predicts team criteria, and whether the strength of prediction differs across typical and maximal
performance contexts. This study therefore integrates and simultaneously tests the findings by
Judge and Bono (2000) and Ployhart et al. (2001) by assessing the FFM determinants and
consequences of transformational leadership. Figure 1 provides an overview of the relationships
examined in this study.
In the following section, we discuss the FFM antecedents of transformational leadership.
Next, we examine the consequences of transformational leadership for team performance in
typical and maximal contexts.
Transformational Leadership and the FFM of Personality
Much progress has been made in the field of leadership research. From the early work on
a one-dimensional model of leadership (Katz, Maccoby, & Morse, 1950), to the two-dimensional
model of initiating structure and consideration (Stogdill & Coons, 1957), to the recent
transformational/charismatic leadership theory (e.g. Bass & Avolio, 1993; Conger & Kanungo,
1988; Shamir, House, & Arthur, 1993), the field has witnessed significant advances in theory
development and empirical work. Despite the existence of numerous leadership theories and
paradigms, it is safe to say that, for the past two decades, transformational leadership theory has
captured much of the research attention (Judge & Bono, 2000). The concept of transformational
leadership can be traced back to Burns’ (1978) qualitative classification of transactional and
transformational political leaders, although it was the conceptual work by House (1977) and Bass
(1981) that brought the concept of transformational leadership to the forefront of leadership
research. Transformational leadership is often contrasted to transactional leadership.
Transactional leadership is often depicted as contingent reinforcement; leader-subordinate
relationships are based on a series of exchanges or bargains between them (Howell & Avolio,
1993). Transformational leaders, on the other hand, rise above the exchange relationships typical
of transactional leadership by developing, intellectually stimulating, and inspiring subordinates
to transcend their own self-interests for a higher collective purpose, mission, or vision (Howell,
& Avolio, 1993). Notice that one consequence of this perspective is a focus on unit-level
interests, beyond those of the individual person.
Transformational leadership is comprised of four constructs (Bass 1998): Charisma or
idealized influence, Inspirational motivation, Intellectual stimulation, and Individualized
consideration. A leader is charismatic if his/her followers seek to identify with the leader and
emulate him/her. Transformational leaders motivate and inspire their followers by providing
meaning and challenge to their work. Intellectually stimulating leadership aims to expand the
followers’ use of their potential and abilities. Finally, individually-considerate leaders are
attentive to their followers’ needs for achievement and growth. These leaders act not only as
superiors but also as coaches and mentors to their subordinates. In short, transformational leaders

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
633

concentrate their efforts on longer term goals, emphasize their vision (and inspire subordinates to
achieve the shared vision), and encourage the subordinates to take on greater responsibility for
both their own development and the development of others (Avolio, Bass, & Jung, 1997; Bass,
1985; Bycio, Hackett, & Allen, 1995; Howell & Avolio, 1993). They are also receptive to
innovations and are likely to promote creativity in their subordinates (Avolio et al., 1997; Bass,
1985). Finally, they are more likely than transactional leaders to cater to individual followers’
needs and competencies (Bycio et al., 1995; Howell et al., 1993). In contrast, transactional
leaders tend to focus on short-term goals and needs of their followers since they operate
predominantly through an economic exchange model, as exemplified by path-goal theory (Koh,
Steers & Terborg, 1995).
Since the inception of the transformational leadership theory two decades ago,
considerable empirical evidence has accumulated in support of the theory (Kirkpatrick & Locke,
1996). Despite this empirical support, questions remain as to what determines or predicts
transformational leadership. Surprisingly little empirical evidence exists to help answer this
question. While much theoretical work has been done linking personality to transformational
leadership (e.g., Bass, 1998; Hogan et al., 1994; Stogdill, 1974), most past research has used so
many different types of traits that relationships obtained are difficult to comprehend or integrate
(see Bass, 1998). However, organizing these findings around the FFM of personality allows
researchers to have a common platform to examine the relationships between personality and
transformational leadership. To our best knowledge, only one study thus far has directly linked
the FFM of personality to transformational leadership. Judge and Bono (2000) found that
extroversion (corrected r = .28) and agreeableness (corrected r = .32) positively predicted
transformational leadership. Although openness to new experience was correlated positively with
transformational leadership, its effect attenuated once the influence of the other traits was
controlled. Despite the small to moderate relationships found, Judge and Bono (2000) provided
preliminary evidence that certain FFM traits may be related to transformational leadership.
Clearly more empirical research is necessary to help refine a theory linking the FFM of
personality to transformational leadership.
Consistent with the results of Judge and Bono (2000), we predict that extroversion,
agreeableness, and openness to new experience will be positively related to transformational
leadership. For example, extroversion should be related because of the dominance and expressive
components of the trait, agreeableness should be related because individual consideration
requires empathy (a key component of agreeableness), and openness should be related because of
the need for creativity to intellectually stimulate subordinates (see Judge & Bono, 2000, for more
detail). Although Judge and Bono (2000) failed to find the hypothesized negative relationship
between neuroticism and transformational leadership in their study, given our military sample,
we believe that neuroticism should be negatively related to transformational leadership. As
neuroticism is often associated with anxiousness, nervousness, low self-confidence, and low self-
esteem (McCrae & Costa, 1991), neurotic military leaders would not be able to exhibit
transformational leadership given the nature of the military environment (i.e., a context
inherently hazardous and often life threatening to both leaders and subordinates, and thus
requiring a strong command structure and leadership). Such a finding is consistent with Ployhart
et al. (2001), who found that more neurotic leaders performed worse on leadership exercises.
Finally, like Judge and Bono (2000) and Ployhart et al. (2001), we do not expect
conscientiousness to be related to transformational leadership.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
634

Hypothesis 1: Extroversion will be positively related to transformational leadership


behavior.
Hypothesis 2: Openness to new experience will be positively related to transformational
leadership behavior.
Hypothesis 3: Agreeableness will be positively related to transformational leadership
behavior.
Hypothesis 4: Neuroticism will be negatively related to transformational leadership
behavior.
Transformational Leadership and Team Performance Under Typical and Maximum Contexts
One reason for the interest in transformational leadership is that it predicts a variety of
important criteria. For example, a meta-analysis by Lowe, Kroeck, and Sivasubramaniam (1996)
found that transformational leadership, aggregated across the four dimensions, was related to
objective (corrected r = .30) and subjective (corrected r = .73) measures of leadership
effectiveness. These relationships generalized across low level (corrected r = .62) and high level
leaders (corrected r = .63), in organizations from both the private (corrected r = .53) and public
sectors (corrected r = .67). Another meta-analysis found transformational leadership correlates
with leader effectiveness, even when transformational leadership and effectiveness are
independently measured (corrected r = .34) (Fuller, Patterson, Hester, & Stringer, 1996).
However, nearly all of the conceptual development and empirical work in
transformational leadership research have been directed towards individual level outcomes (e.g.
individual satisfaction and performance). Little attention has been paid to the influence of a
leader on group or organizational processes and outcomes (Conger, 1999; Yukl, 1999). In fact, a
recent meta-analysis by Judge et al. (2002) did not find a single leadership study that had used
group performance as the leadership effectiveness measure. Since then, only two empirical
studies have linked transformational leadership to unit-level performance criteria. Bass et al.
(2003) found transformational leadership predicted unit performance in infantry teams, and Dvir
et al. (2002) found transformational leadership training resulted in better unit performance
relative to groups that did not receive the training. Thus, while many argue leadership
effectiveness should be assessed in terms of team or organizational effectiveness (e.g., Hogan et
al., 1994), in reality most studies evaluate leadership effectiveness in terms of ratings provided
by superiors, peers, or subordinates (Judge et al., 2002).
Obviously, this is a critical void in the leadership literature, despite the clear implications
of transformational leadership for team-level outcomes. For example, the theory predicts that
transformational leaders will inspire followers to transcend their own self-interests for a higher
collective purpose (Howell et al., 1993). Likewise, Bass (1998) hypothesizes transformational
leadership fosters “a greater sense of a collective identity and collective efficacy (p.25).”
Transformational leaders are also instrumental for the development of important team processes
such as unit cohesion and team potency (Bass et al., 2003; Dvir et al., 2002; Guzzo, Yost,
Campbell, & Shea, 1993; Sivasubramaniam, Murry, Avolio, & Jung, 2002; Sosik, Avolio, Kahai
& Jung, 1998; Sparks & Schenk, 2001). Given the instrumental role of transformational
leadership to the development of important team processes, it would hardly be surprising that
teams with transformational leaders should outperform teams without such leaders (e.g., Dvir et
al., 2002).
Theory and research must demonstrate links between transformational leadership and
unit-level performance because without such empirical research, we are forced to rely on
findings at the individual level. This is can potentially be a dangerous practice, as research on

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
635

levels of analysis (e.g., Klein, Dansereau, & Hall, 1994; Kozlowski & Klein, 2000; Rousseau,
1985) has shown that findings at one level of analysis cannot automatically be assumed to exist
at a higher level. Similarly, in practice leaders are expected to influence collective outcomes
such as team performance and organizational effectiveness, and they are oftentimes held
accountable for accomplishing such outcomes (Yammarino, Dansereau, & Kennedy, 2001).
Clearly for both theoretical and practical reasons, it is critical that transformational leadership be
linked to team performance.
The present study intends to provide some preliminary data on this issue by linking
transformational leadership to team performance. Based on prior theory (e.g., Bass, 1985;
Kozlowski, Gully, McHugh, Salas, & Cannon-Bowers, 1996) and previous empirical findings
(Bass et al., 2003; Dvir et al., 2002), we expect a positive relationship to exist.
Yet beyond this simple relationship, we examine the relationship between
transformational leadership and team performance assessed under typical and maximum
performance contexts. As noted by Sackett, Zedeck, and Fogli (1988), maximum performance
contexts occur when the following conditions are satisfied: (a) one is aware he/she is being
evaluated, (b) the instructions to perform maximally on the task are accepted, and (c) the task is
of relatively short duration so the person can maximize effort. An important necessary condition
to compare typical and maximum measures is that only the performance context changes; the
content of the performance domain must remain the same.
Sackett and colleagues demonstrated the importance of this distinction at the individual
level by showing typical and maximum performance are different constructs and have different
antecedents (DuBois, Sackett, Zedeck, & Fogli, 1993; Sackett et al., 1988; see also Ployhart et
al., 2001). In this study, we do not claim to have direct measures of typical and maximum
performance constructs, but rather assess team performance under typical and maximum
performance contexts. As such, we propose that teams face maximum performance contexts,
with the conditions of short time span, awareness of being evaluated, and acceptance of
instructions to exert maximum effort being critical features of such maximum contexts (see
Kozlowski et al., 1996). Common examples include SWAT teams, small unit combat teams, and
even project teams responding to crises.
One implication of distinguishing between the two performance contexts is that the
determinants and consequences of transformational leadership may likewise differ. Preliminary
evidence supports this assertion, as Ployhart et al. (2001) found the criterion-related validities of
the FFM differed for both typical and maximum leadership performance measures in a military
sample. Openness to new experiences was predictive of transformational leadership performance
in a maximum performance condition, neuroticism was most predictive of transformational
leadership in a typical performance condition (having an adverse effect on performance), and
extroversion was predictive of both. Importantly, they found the effect sizes tended to be
stronger for maximum performance. However, they did not directly assess transformational
leadership; they used ratings of transformational behaviors at the individual level. Thus, it is not
known whether and how transformational leadership might relate differently to team
performance in typical and maximum settings.
In this study, we extend these findings to propose transformational leadership will be
more predictive of team performance in maximum rather than typical performance contexts. This
expectation is consistent with theory, as many of the reasons offered as requiring
transformational leadership are inherently “maximum performance” unit-level phenomena (e.g.,
maintaining unit performance during a merger, and military units in combat). For example, Bass

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
636

(1985; 1988; 1998) has repeatedly argued the importance of transformational leadership to
groups and organizations during periods of stress, crisis, instability, and turmoil. Indeed,
transformational leadership makes a difference in these situations. First, transformational leaders,
using inspirational motivation and individualized consideration behaviors, are able to reduce the
stress experienced by followers by instilling a sense of optimism and collective efficacy (Bass,
1998). Second, transformational leaders, using idealized influence behaviors, can direct
followers’ attention to a superordinate goal and lead followers toward the resolution of the crisis
(Bass, 1998). Third, transformational leaders, using intellectual stimulation behaviors, are able to
break out from old rules and mind sets and encourage their followers to do likewise, by
promoting an effective decision making process whereby different ideas, opinions, and
alternatives are freely articulated before arriving at a decision (Atwater & Bass, 1994; Bass,
1990). Based on this theoretical reasoning, we propose:
Hypothesis 5: Transformational leadership will be more predictive of team performance
in maximum rather than typical performance contexts.
Transformational Leadership as a Mediator Between the FFM and Team Performance
Thus far, we have discussed the antecedents and consequences of transformational
leadership in a bivariate fashion. Yet as Figure 1 shows, the FFM, transformational leadership,
and team performance are theoretically expected to relate to each other in a mediated
multivariate model. Such a model is consistent with recent suggestions to develop process
models linking personality to work outcomes (e.g., Barrick, Mount, & Judge, 2001).
Extending the logic outlined in the previous sections, we propose transformational
leadership will fully mediate the relationship between the FFM and team performance in the
maximum performance context, but only partially mediate the relationship between the FFM and
team performance in the typical context. We base these hypotheses on several lines of evidence.
First, as noted previously, transformational leadership is expected to be most important in times
of extreme time pressure, stress, and instability—maximum performance conditions (e.g., Bass,
1988; 1998; Ployhart et al., 2001). In such a condition transformational leadership should be the
primary determinant of team performance. Second, transformational leadership will still be
important under typical performance contexts, but to a lesser extent than in maximum
performance contexts, and the more “mundane” nature of typical performance will allow
personality to also be important. This is based on previous theory and research arguing
personality is a stronger predictor of typical performance because the personality-based
behaviors of effort and choice are more constrained in maximum performance contexts. In
contrast, the long time periods involved with performance in typical contexts allow individual
differences in effort and choice to more strongly manifest themselves and thus personality will
determine performance (e.g., Cronbach, 1949, 1960; DuBois et al., 1993; Ployhart et al., 2001;
Sackett et al., 1988).
Hypothesis 6: Transformational leadership will fully mediate the relationship between
leader personality (in terms of the FFM) and team performance in maximum contexts.
Hypothesis 7: Transformational leadership will partially mediate the relationship between
leader personality (in terms of the FFM) and team performance in typical contexts.
Method
Sample
The sample comprised participants from the Singapore Armed Forces: (a) 39 team
leaders, (b) 202 followers, (c) 20 superiors of these combat teams, and (d) 15 assessment center
assessors. Hence, in total, 276 military personnel participated in the study. The team leaders and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
637

soldiers constituted 39 combat teams. These were intact teams going through training; team
leaders and team members had originally been randomly assigned to form these teams according
to standard military practice. These combat teams had been training together for nearly three
months prior to the commencement of the study. The size of these teams varied from four to
seven members, with a mean of five. These participants were all males who were enlisted for
compulsory National Service. Their age ranged from 18 to 23 years old (M = 19.3 years, SD =
1.04). The racial composition of the sample mirrored the general population, which is
predominately Chinese.
Team performance was measured by ratings from various sources (superiors and
assessment center assessors) under maximum and typical performance contexts. Team
performance measures under the typical performance context were obtained via supervisory
ratings near the end of the team training. As described by Sackett et al. (1988), these
performance measures are similar to performance appraisal ratings in organizations, in that they
assess performance over a longer time period. On the other hand, team performance measures
under the maximum performance context were obtained during a one day assessment center
conducted to evaluate the combat proficiency of the team. Note that there was no overlap
between raters providing performance ratings across the two conditions. Further, no raters would
have known team members because the raters came from other military units (a brief survey
administered post-hoc to six assessors and four supervisors supported these expectations). While
we cannot definitively equate these two sets of performance measures as reflecting latent typical
performance and maximum performance constructs (a point we return to in the Limitations
section), the fact that the ratings were obtained under two very different measurement contexts is
consistent with the requirements for typical and maximum performance conditions (e.g., Sackett
et al., 1988). That is, participants were fully aware that the assessment center was an evaluative
context, they were given explicit instructions to maximize their performance, and the assessment
center took place over a short period of time (i.e., one day).
Procedure
Participants were team members of intact military teams undergoing military training.
Leaders and team members were originally randomly assigned to form these teams by the unit
commanders. About 10 weeks into the training, leaders completed a measure of the FFM of
personality while their subordinates’ ratings of the leader’s transformational leadership were
obtained through a survey administrated by one of the primary researchers and several assistants.
Given the highly intensive and interactive time subordinates spent with their leaders, followers
should have had sufficient opportunity to observe and thus provide accurate ratings of
transformational leadership. About three weeks later, supervisors’ ratings of the team’s training
performance were collected. These ratings of the team over the 3 month training course are
reflective of performance under more typical conditions. The teams were trained to perform
some basic military tasks such as capturing an enemy observation post or laying an ambush.
About the same time, an assessment center, designed to evaluate the combat proficiency of the
combat team, was used to obtain measures of the team’s performance in maximum performance
contexts. Different sets of evaluators were used to provide typical and maximum performance
measures. Given that different sources completed the various measures, same source bias was
less of an issue in this study, although this does not eliminate other potential sources of shared
contamination between the ratings (an issue we address more fully in the Limitations section).
Prior to the data collection, we checked with the unit commanders to ensure these combat teams
were being trained and evaluated in accordance with the stipulated training doctrine.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
638

Measures
Leader Personality. The personality of the leaders was measured using the International
Personality Item Pool (IPIP) developed by Goldberg (1998; 1999). The IPIP is a broad-
bandwidth, public domain personality inventory that directly measures the FFM. It was
developed as part of an international development program (e.g., Hofstee, de Raad, Goldberg,
1992). Although items were also developed to measure facets, we did not collect these data as
Judge and Bono (2000) found that the specific facets of the FFM predicted transformational
leadership less well than the general factors. The IPIP instrument is a 50-item measure with 10
items for each factor of the FFM (i.e. extroversion, agreeableness, conscientiousness,
neuroticism, and openness to new experience). In this study, we found the following alpha
reliabilities: .77, .74, .72, .82, and .80 respectively. These reliabilities are similar to Ployhart et
al. (2001): .80, .67, .75, .83, and .77. Each item was assessed using a 5-point scale ranging from
1 (very inaccurate) to 5 (very accurate), and each factor was scored such that higher numbers
indicate greater quantities of the trait.
Transformational leadership. Transformational leadership of the military leaders were
measured using the 36 item Multi-factor Leadership Questionnaire (MLQ form 5X) (Avolio,
Bass, & Jung, 1999). Followers described their leader using a frequency scale that ranges from 1
= not at all, to 5 = frequently, if not always. Note the MLQ form 5X uses a 0 to 4 point rating
scale; we used a 1 to 5 point scale in this study to be consistent with existing military answer
sheets. However, the items and anchors for our rating scale are identical to those from the MLQ,
thus the change in scale is a straightforward linear transformation. Furthermore, raters should
have used the rating scales in an equivalent manner because considerable research suggests it is
rater training, and not the rating format, that most influences rating variance (see Landy & Farr,
1980; Murphy & Cleveland, 1995, for reviews). The five scales used to measure transformational
leadership were: charisma/idealized influence (attributed), charisma/idealized influence
(behavior), inspirational motivation, intellectual stimulation, and individualized consideration.
Like previous research (Judge & Bono, 2000), we combined these dimensions into an overall
measure of transformational leadership. The internal consistency reliability of the overall
transformational leadership scale was .88. In order to justify aggregation, we calculated
intraclass correlation coefficients (ICC(1)) (Ostroff & Schmitt, 1993). In the present study, the
ICC(1) was .22 (p < .05). Past research has used ICC(1) levels ranging from .12 (James, 1982) to
.20 (Ostroff et al., 1993) to justify aggregation. Hence, given the high level of ICC(1),
aggregating followers’ transformational leadership scores to reflect the transformational
leadership of the team leader is statistically justified.
Team Performance in Typical Contexts. Supervisors’ ratings of team training
performance were obtained near the end of the team training. Five supervisors provided
performance ratings for each team. As these superiors were directly involved in the training of
these teams, they had ample opportunity to observe the team in action. Supervisors were asked to
rate the team’s performance on two dimensions: the efficiency of the team actions, and the
quality of team actions. That is, supervisors rated the team’s effectiveness and efficiency in
learning and practicing for the military exercises that were later evaluated in the assessment
center (i.e., maximum performance condition). Supervisors were instructed not to base their
assessment on the team’s performance in day-to-day garrison activities (e.g., guard duty,
physical fitness training, administration), but rather to focus on the behaviors associated only
with the training program (e.g., actions taken to secure a critical road junction for friendly
forces). Therefore, the performance measures across the two contexts tapped the same

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
639

performance domains at the same level of specificity (e.g., Sackett et al., 1988). Each of these
two items was rated on a five point Likert scale, ranging from 1 to 5, where higher scores reflect
higher efficiency or quality of team actions. As these two scores were highly correlated (r = .72),
we decided to average them to form a composite team performance score. Given that the ICC (1)
is .35 (p < .05) and ICC(2) = .85, we averaged the scores across raters to form an overall typical
team performance rating for each team.
Team Performance in Maximum Contexts. These ratings were collected during a one-day
assessment center conducted at the end of the team training to evaluate the combat proficiency of
the team. One external assessor was randomly assigned to evaluate the performance of the team
over a series of six military tasks (e.g., the team may be tasked by HQ to evacuate a casualty
from one place to another); these tasks comprehensively summarized the types of tasks
performed as part of the team training. The assessor used a 5 point Likert scale to evaluate the
efficiency and the quality of team actions on each of the tasks. As there was only one assessor
per team, inter-rater reliability was not available; however, the inter-task reliability for the
efficiency measure was .90 while the inter-task reliability for the quality of team actions was .87.
As with the ratings collected under the typical performance context, both of these measures were
highly correlated (r = .67, p < .01) and so a composite team performance measure was created.
Content Equivalence of the Performance Ratings. To ensure the content and specificity
of the performance measures was sufficiently similar across the two performance contexts, we
asked 10 subject matter experts (SMEs) with extensive experience in training and evaluating this
type of combat team to respond to a 10 item survey. The survey sought their opinions about the
overlap of the performance domains being assessed by the supervisors and the assessors under
the two performance conditions. As shown in Table 1, the responses from these SMEs
demonstrate the performance domains assessed by the supervisors and the assessors were highly
similar in terms of content. The mean response from these SMEs on the 10-item survey is 5.2 on
a six-point scale, indicating these raters believed the overlap between the two performance
measures was present to at least “a great extent.” This information, coupled with the fact that
raters were instructed to only consider team behaviors associated with the content of the training,
and that the same performance dimensions and response scales were used for both conditions,
suggests that the performance ratings assessed the same performance domain with the same
degree of specificity. Thus, the ratings obtained under the two contexts differ in terms of the
performance demands placed on the team (typical or maximum), and perhaps also the knowledge
raters had about the teams and leaders (a point we return to in the Limitations section).
Results
Power Analysis and Data
In contrast to research at the individual level of analysis, the difficulty of collecting data
from large samples of intact teams usually results in smaller sample sizes. For instance, Liden,
Wayne, Judge, Sparrowe, Kraimer, and Franz (1999) analyzed 41 workgroups while Marks,
Sabella, Burke, and Zaccaro (2002) analyzed 45 teams. Indeed, Cohen and Bailey (1997) report
the average number of teams in project team research (the type of teams most similar to those in
the present study) averages only 45 per study. Such is the case in this study, making a careful
consideration of power, p-values, and effect size important.
A power analysis shows that given a sample size of 39 teams, there is only a 59% chance
of detecting moderate effects at p < .05 (one-tailed) (Cohen, 1988). However, with a one-tailed
test and p < .10, power becomes 73%. Hence, we considered values with p < .10 (one-tailed) to
be statistically significant instead of p < .05 (one-tailed) because of the low statistical power

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
640

arising from small sample size (however, if values are less than p < .05, we report the lower p
value). Note that because we hypothesize a specific direction for our hypothesis tests, a one-
tailed test is appropriate (Cohen & Cohen, 1983). In light of recent recommendations (e.g.,
Wilkinson et al., 1999), and to help better interpret the magnitude of these effects, we also report
90% confidence intervals around each of our hypothesis tests. Thus, by presenting effect sizes, p-
values, and confidence intervals, readers should best be able to determine the “importance” of
the effects and tests we report.
A different concern, perhaps more important with smaller sample sizes, is the presence of
outliers and extreme cases. To ensure the results were not biased by extreme cases, we examined
the distributions of the variables in terms of skewness and kurtosis (zero represents perfectly
normal distributions, skewness > ±3 and kurtosis > ±7 are indicative nonnormal distributions; see
West, Finch, & Curran, 1995). None of the measures were nonnormal, with openness to new
experiences showing the largest deviation from normality (skewness = -1.27; kurtosis = 4.37) but
still falling well within the range of appropriate normality.
Performance Ratings Across Typical and Maximum Contexts
Table 2 shows the means, standard deviations, and correlations for all measures. As
reflected in Table 2, the low correlation between the team performance measures across the
typical and maximum performance contexts (r = .18, ns) suggests the ratings from these
contexts were not interchangeable.
Hypotheses
Hypotheses 1 through 4 predicted that extroversion (Hypothesis 1), openness to new
experiences (Hypothesis 2), and agreeableness (Hypothesis 3) would be positively related to
transformational leadership, while neuroticism (Hypothesis 4) would be negatively related. As
shown in the last row of Table 2, transformational leadership is positively related to extroversion
(r = .31, p < .05, [.04; .53]), but negatively related to both neuroticism (r = -.39, p < .05, [-.59;
-.14]) and agreeableness (r = -.29, p < .05, [-.52; -.03]). Transformational leadership is not
significantly related to openness to experience, nor conscientiousness. Hence Hypotheses 1 and 4
were supported while Hypotheses 2 and 3 were not. While the relationship between
transformational leadership and agreeableness was significant, it was in the opposite direction as
hypothesized.
In line with Murphy’s (1996) recommendation that personality should be examined using
a multivariate framework, we also conducted a multiple regression analysis in which
transformational leadership was regressed on all of the FFM constructs. As shown in Table 3, the
overall model comprising the five personality factors was significant, explaining 28% of the
variance in transformational leadership ratings (F[5, 33] = 2.59, p < .05). However, only
neuroticism (β = -.29, p < .10, [-.57; .-01]) and agreeableness (β = -.30, p < .10, [-.58; -.02]) were
significant predictors at p < .10 (one tailed).
Next, Hypothesis 5 predicted that transformational leadership would be more predictive
of team performance in maximum rather than typical performance contexts. As Table 2 shows,
transformational leadership was significantly related to team performance in both typical
contexts (r = .32, p < .05, [.06; .54]) and maximum contexts (r = .60, p < .05, [.40; .75]). The
formula proposed by Williams (1959) and Steiger (1980) was used to test for the difference
between two non-independent correlations. We found these correlations to be significantly
different t(36) = 1.63, p < .10 (one tailed), although the confidence intervals overlapped slightly.
Thus, we concluded that the relationship between transformational leadership and team

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
641

performance was significantly stronger in the maximum context than in the typical context,
supporting Hypothesis 5.
Mediation Hypotheses
We examined transformational leadership as a mediator of the relationship between
leader personality (i.e., FFM) and team performance using a procedure outlined by Baron and
Kenny (1986). Hypothesis 6 predicted that transformational leadership would fully mediate the
relationship between the FFM and team performance in maximum contexts, while Hypothesis 7
predicted that transformational leadership would partially mediate the relationship between the
FFM and team performance in typical contexts. To test transformational leadership as a
mediator, we first examined whether the FFM accounted for significant variance in
transformational leadership and both team performance measures and whether transformational
leadership was related to both team performance measures. If these regression models were
statistically significant, we could then examine the effects of the FFM on both team performance
measures after controlling for transformational leadership.
Results from the multiple regression analyses show that the FFM explained significant
variability in transformational leadership (R2 = .28; F[5, 33] = 2.59, p < .05 [one tailed]), team
performance in typical contexts (R2 = .41; F[5, 33] = 4.50, p < .05 [one tailed]), and team
performance in maximum contexts (R2 = .20; F[5, 33] = 1.62, p < .10 [one tailed]). These
findings indicate that the FFM is associated with transformational leadership and both team
performance measures.
To test for mediation, we entered the FFM into the regression equation after controlling
for transformational leadership. The FFM did not produce a significant increment in variance for
predicting team performance in maximum contexts (∆R2 = .12; ∆F[5, 31] = 1.42, ns). On the
other hand, the FFM accounted for significant incremental variance in predicting team
performance in typical contexts after controlling for transformational leadership (∆R2 = .34;
∆F[5, 31] = 3.87, p < .05). Hence, both Hypotheses 6 and 7 are supported. That is,
transformational leadership fully mediated the relationship between the FFM and team
performance in the maximum performance context, but only partially mediated the relationship
between the FFM and team performance in the typical performance context. Keep in mind this
finding is primarily due to the fact that the relationship between the FFM model and performance
in the typical context was about twice as strong as it was in the maximum context, while the
relationship between transformational leadership and performance was about twice as strong in
the maximum context as it was in the typical context.
Discussion
The purpose of this study was to examine the antecedents and consequences of
transformational leadership. Four of the FFM of personality constructs were hypothesized as
antecedents of transformational leadership, and the consequences of transformational leadership
were expected to occur for team performance across typical and maximum performance
contexts, but more strongly for the latter. The results suggest that transformational leadership is
positively related to extroversion, and negatively related to agreeableness and neuroticism,
although in a multiple regression only neuroticism and agreeableness were predictive. The
results also show that transformational leadership has important consequences for team
performance, but the magnitude of these relationships is dependent on whether performance is
assessed in typical or maximum contexts. In particular, transformational leadership appears to be
more predictive of team performance in maximum contexts. In addition, we found
transformational leadership fully mediated the relationship between the FFM and team

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
642

performance in maximum contexts, while only partially mediated the relationship between FFM
and team performance in typical contexts. We now describe these major findings in more detail.
Major Findings
Consistent with our bivariate hypotheses, extroversion was positively related to
transformational leadership and neuroticism was negatively related. Contrary to our hypotheses,
openness was unrelated to transformational leadership, and the significant effect for
agreeableness was negative and opposite to our prediction. The multivariate analyses found only
extroversion and agreeableness were significantly related to transformational leadership.
Comparing our multivariate effects to the multivariate findings of Judge and Bono (2000)
shows both similarities and differences (we compare the multivariate models rather than the
bivariate effects because in practice, these various traits influence behavior in combination;
Murphy, 1996). Both studies found effects for agreeableness, and no effects for
conscientiousness and openness to new experience. Yet this study found the effect for
agreeableness was negative whereas Judge and Bono (2000) found the effect was positive. In
other words, the more agreeable military leaders were rated as less transformational by their
followers. Another area of difference with Judge and Bono (2000) is that this study found a
significant effect for neuroticism while they found a significant effect for extroversion.
The differences between our study and Judge and Bono (2000) may exist due to the
nature of our sample, which was primarily young and entirely male (a point we return to shortly).
Alternatively, they might suggest the existence of moderators on the relationship between
personality and transformational leadership, specifically setting (military versus civilian). For
example, military samples were used in both Ployhart et al. (2001) and the present study, and
both found neuroticism and agreeableness had an adverse effect on transformational leadership.
In contrast, the Judge and Bono (2000) sample was comprised of business leaders and found no
effect for neuroticism and a positive effect for agreeableness. Compared to business leaders,
military personnel often have to work under hazardous and life threatening situations, hence the
ability to remain calm, secure, and non-anxious is critical. Followers will often look to them for
direction and leadership in these critical times; perhaps under such conditions being agreeable
does not contribute to perceptions of effective leadership (e.g., followers may want direction in
crisis situations). Future research will be necessary to determine whether context truly acts as a
moderator of the personality-transformational leadership relationship.
With respect to the team performance measures, we found the correlation between team
performance assessed in typical and maximum contexts was small and non-significant, a finding
consistent with research conducted at the individual level (e.g., DuBois et al., 1993; Ployhart et
al., 2001; Sackett et al., 1988). The consequence of this distinction at the team level can be seen
when examining relations to leadership, as transformational leadership was more predictive of
team performance when it was assessed in maximum performance contexts. While it may be too
early to draw any definitive conclusions given the small sample size and potential limitations of
the design (discussed below), future research linking transformational leadership to team
performance might consider this distinction. Previous research has found that transformational
leaders are capable of developing important team processes (e.g., unit cohesion, team potency,
collective efficacy, organizational trust and commitment, a sense of higher purpose or shared
vision; Bass et al., 2003; Shamir, et al., 1993); we speculate the consequences of these team
processes may matter most in maximum performance conditions. More empirical research using
tighter designs is definitely needed to test this hypothesis.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
643

The finding that transformational leadership fully mediated the relationship between the
FFM and team performance assessed in maximum contexts, but only partially mediated the
relationships between the FFM and team performance assessed in typical contexts, is consistent
with the hypothesis that typical predictors (such as personality) are more strongly related to
typical performance measures than to maximum performance measures (Cronbach, 1949, 1960;
DuBois et al., 1993; Sackett et al., 1988). This difference is due to the fact that individual
differences in personality will primarily manifest themselves in typical performance contexts; in
maximum performance contexts effort and choice are more constant across people (Cronbach,
1949; 1960). This was found in our data because the relationship between the FFM and
performance in the typical context was about twice as large as the relationship between the FFM
and performance in the maximum context. Transformational leadership showed just the opposite
pattern, with the effect size being greater in the maximum than typical context. These findings
suggest transformational leadership may become most critical in maximum performance contexts
and have several implications for both transformational leadership and personality research. For
example, there may be a greater need to evaluate leadership performance in light of team
performance. If effective leadership is hypothesized to ultimately improve team effectiveness,
team performance across different contexts may provide an additional criterion construct that
may be used for leader selection, training, and development.
Limitations and Directions for Future Research
Like any field study, there are a number of potential issues we could not control that may
influence the interpretation of our findings. Readers should be mindful of these alternative
explanations because they must be carefully considered in the design of any future research.
Indeed, future theory building will be dependent on researchers adopting more stringent methods
and designs that address the limitations noted here. One of the most pressing issues will be for
future research to eliminate the potential contamination present among the various ratings.
Despite the fact that several sources of ratings were used (e.g., supervisor, followers, etc.), the
ratings may be contaminated to various degrees by different sources of information or common
sources of information, ultimately making it difficult to cleanly demonstrate casual relationships.
These are important concerns and we discuss them in some detail
First, it is impossible to definitively conclude the team performance measures collected
under the typical and maximum performance contexts represent typical and maximum
performance constructs. While our results may be consistent with this hypothesis, an alternative
explanation is that raters differed in the types and amount of information they had for each leader
and team in each context. For example, assessors were only able to observe the teams and leaders
during the one-day exercise, and thus could only use this on-task information to make their
ratings. Alternatively, raters providing ratings in the typical condition were with the teams and
leaders for several weeks, and could have been influenced (whether consciously or not) by non-
task information (e.g., the personality of the leader, how well they got along with the team);
despite the presence of rater training. Thus, as noted by an anonymous reviewer, an alternative
explanation of these findings was that differences in raters’ implicit theories of leadership
accounted for the findings, not differences due to typical and maximum performance. It is also
possible that differences in the reliability for the two sets of ratings partly explain our findings.
For these reasons, it is important to realize that our results and theoretical implications speak
only to there being a distinction between the two performance conditions; they do not provide
strong evidence as to why the differences exist.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
644

This is clearly an important area for future research, as studies should measure the
relevant explanatory constructs (e.g., liking) to rule out this alternative explanation. Such
research would contribute to a better understanding of the features and conditions underlying
typical and maximum performance constructs. Although the three conditions proposed by
Sackett et al. (1988) have helped stimulate research on this topic, they may be in need further
refinement. Research must begin to assess what are essentially “manipulation checks” to
distinguish between the two conditions (e.g., assess perceptions of effort, instruction acceptance,
duration; assessor biases and perceptions). With that said, the present study may still have
implications for practice, as finding a difference between the two contexts has consequences for
how organizations should use the two types of performance measures. Regardless of whether the
difference is explained by differences in the performance construct or differences in raters’
implicit theories of performance, organizations that assess performance in typical and maximum
conditions must realize these measures may not be interchangeable. For example, the practical
question facing the organization examined in this study is which type of rating is most useful for
different administrative purposes (e.g., assessment/validation, performance appraisal,
development, promotion). Additionally, theoretical and empirical work on typical/maximum
performance should be conducted at the team level of analysis, given their increasing frequency
in organizations (Devine, Clayton, Philips, Dunford & Melner, 1999).
A second and equally important issue relating to contamination concerns the followers’
ratings of transformational leadership. As noted by an anonymous reviewer, follower ratings of
transformational leadership may have been affected by the team’s performance in stressful
training exercises held during the first few weeks of training, and this may account for the large
(.60) relationship between transformational leadership and team performance in maximum
contexts. That is, success of the team in challenging contexts may have led followers to rate the
leader as more transformational, and thus there is a shared source of variance contributing to the
relationship between transformational leadership and team performance. To the extent this
contamination exists, it decreases our ability to make inferences of how much transformational
leadership causes (or at least predicts) team performance.
A related concern is the potential for informal communication among the various sources
that would contribute to common variance among the correlations. This is not method bias in the
traditional sense, but the potential for common information about the quality of the unit to be
known by assessors, supervisors, and followers. We noted earlier that assessors and followers
were from different units, assessors were not aware of how the teams were performing in the
training, and supervisors were not familiar with how their teams performed in the assessment
exercises. Our post-hoc survey of supervisors and assessors supported these expectations, but the
fact remains some informal means of communicating team quality may have occurred.
Such concerns with contamination among the ratings must be addressed in future
research for stronger causal relationships to be theoretically supported. For example, Kozlowski,
Chao, and Morrison (1998) and Murphy and Cleveland (1995) review evidence showing the
provision of ratings for administrative purposes is largely a political process. In contrast, ratings
collected for research-only purposes may oftentimes show better psychometric properties.
Future research might therefore implement a formal research-only design whereby the ratings are
completely anonymous and confidential. Likewise, ratings may be supplemented with other
sources of performance information (e.g., objective or administrative indices) to help understand
the construct validity of the ratings. Finally, quasi-experimental designs such as that conducted
by Dvir et al. (2002) would be most helpful in establishing stronger inferences of causality.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
645

A third limitation is the relatively small sample size. Although more than 276 military
personnel participated in this study, as the unit of analysis is at the team and leader level, this
reduced the sample size to 39 teams and leaders. To some extent, this is the reality of team
research in the field (e.g., Cohen & Bailey, 1997). However, our sample was also unique in that
the participants were primarily young and entirely male. One implication of these realities is that
effect sizes may be different in other contexts and settings. An important avenue for future
research will be to replicate and extend this study using different samples (military/business),
cultures, measures, designs, and contexts.
A fourth limitation is that we did not examine any team process variables in this study,
and thus there is no way to determine why and how transformational leadership was related to
team performance across the two contexts. Future research should include team processes
variables so that important mediators of the relationship between transformational leadership and
team performance can be determined (e.g., Bass et al., 2003). Such mediated models also help
establish stronger inferences of the causal sequence of psychological processes. Our prediction is
that the mediators may be somewhat different across typical and maximum performance
contexts, or at least the effect sizes of these mediating processes will be relatively different. An
interesting question for future research is that while transformational leaders are capable of
developing important team processes, the incremental value of these team processes will only be
brought to bear when teams must perform in maximum conditions.
A final potential limitation regarding transformational leadership is that while we have
used the group mean as an indicator of the team leader’s transformational leadership, the
dispersion of team members’ leadership ratings (operationalized in terms of standard deviation or
Rwg) may be, in and of itself, an important reflection of team alignment. Indeed, Bliese and
Halverson (1998) found that group consensus about leadership was related to important
psychological well-being. Future research should explore this issue further, perhaps in
conjunction with mediating processes.
Conclusions
In today’s dynamic workplace, organizations must increasingly contend with varying degrees of
uncertainty for such reasons as mergers and acquisitions, global competition, and changes in the
economy and stock market. It is in such times that transformational leadership is critically
needed to lead these organizations out of uncertainty. This study attempts to fill several
important voids in the transformational leadership literature by examining the potential
dispositional antecedents of transformational leadership, and the consequences of
transformational leadership on collective performance under typical and maximum performance
contexts. We found that transformational leadership appears to be more critical for team
performance under a maximum performance context than a typical performance context. Future
research should address the limitations present in this field study to help build theories linking
transformational leadership to collective performance in typical and maximum contexts. A quote
from Bass (1998) captures elegantly the essence of transformational leadership and our findings:
“To be effective in crisis conditions, the leaders must be transformational (p.42)….transforming
crises into challenges (p.45).” Future leadership research will likewise need to transform our
understanding of individual-level process in typical performance contexts to multilevel process
in typical and maximum performance contexts.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
646

References
Atwater, D.C., & Bass, B.M. (1994). Transformational leadership in teams. In D.C., Atwater,
B.M., Bass (1994). Improving organizational effectiveness through transformational
leadership (pp. 48-83). CA: Thousand Oaks.
Avolio, B.J., Bass, B.M., & Jung, D.I. (1999). Reexamining the components of transformational
and transactional leadership using the Multifactor Leadership Questionnaire. Journal of
Organizational and Occupational Psychology, 72, 441-462.
Barrick, M.R., & Mount, M.K. (1991). The Big Five personality dimensions and job
performance: A meta-analysis. Personnel Psychology, 44, 1-26.
Barrick, M.R., Mount, M.K., & Judge, T.A. (2001). Personality and performance at the
beginning of the new millennium: What do we know and where do we go next?
International Journal of Selection & amp, 9, 9-30
Bass, B.M. (1981). From leaderless group discussions to the cross-national assessment of
managers. Journal of Management, 7, 63-76.Bass. B.M., (1985). Leadership and
performance beyond expectations. New York: Free Press.
Bass, B.M., (1988). The inspirational processes of leadership. Journal of Management
Development, 7, 21-31.
Bass, B.M. (1990). Bass and Stogdill’s handbook of leadership: theory, research and
management applications (3rd). New York: Free Press.
Bass, B.M. (1998). Transformational leadership: Industry, Military, and Educational Impact.
Mahwah, NJ: Erlbaum.
Bass, B.M., & Avolio, B. J. (1993). Full range leadership development: Manual for the
Multifactor Leadership Questionnaire. Palo Alto, CA: Mind Garden.
Bass, B.M., Avolio, B. J., Jung, D. I., & Berson, J. (1993). Predicting unit performance by
assessing transformational and transactional leadership. Journal of Applied Psychology,
88, 207-218.
Becker, T.E., & Billings, R.S. (1993). Profiles of commitment: An empirical test. Journal of
Organizational Behavior, 14, 177-190.
Burns, J.M. (1978). Leadership. New York: Free Press.
Bycio, P., Hackett, r.D., & Allen, J.S. (1995). Further assessments of Bass’s (1985)
conceptualization of transactional and transformational leadership. Journal of Applied
Psychology, 80, 468-478.
Cohen. J. (1988). Statistical power analysis for the behavioral sciences (2nd). New Jersey:
Lawrence Erlbaum Associates.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the
behavioral sciences (2nd edition). Hillsdale, NJ: Lawrance Erlbaum Associates.
Cohen, S. G., & Bailey, D. E. (1997). What makes teams work: Group effectiveness research
from the shop floor to the executive suite. Journal of Management, 23, 239-290.
Conger, J.A. (1999). Charismatic and transformational leadership in organizations: an insider’s
perspective on these developing streams of research. Leadership Quarterly, 10, 145-179.
Conger, J.A., & Kanungo, R.N. (1988). Toward a behavioral theory of charismatic
leadership. In J.A. Conger & R.N. Kanungo (Eds.), Charismatic leadership: The elusive
factor in organizational effectiveness (pp. 78-97). San Francisco, CA: Jossey-
Bass.
Cronbach, L. J. (1949). Essentials of psychological testing. New York: Harper.
Cronbach, L. J. (1960). Essentials of psychological testing (vol. 2). New York: Harper.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
647

Devine, D.J., Clayton, L.D., Philips, J.L., Dunford, B.B., & Melner, S.B. (1999). Teams in
organizations: Prevalence, characteristics, and effectiveness. Small Group Research, 30,
678-711.
DuBois, C.L.Z., Sackett, P.R., Zedeck, S., & Fogli, L. (1993). Further exploration of typical and
maximum performance criteria: Definitional issues, prediction, and white-black
differences. Journal of Applied Psychology, 78, 205-211.
Dvir, T., Eden, D., Avolio, B. J., & Shamir, B. (2002). Impact of transformational leadership on
follower development and performance: A field experiment. Academy of Management
Journal, 45, 735-744.
Fullagar, C., McCoy, D., & Shull, C. (1992). The socialization of union loyalty. Journal of
Organizational Behavior, 13, 13-26.
Fuller, B.J., Patterson, C.E.P., Hester, K., & Stringer, D.Y. (1996). A quantitative review of
research on charismatic leadership. Psychological Reports, 78, 271-287.
Goldberg, L.R. (1990). An alternative “description of personality”,: The Big-Five factor
structure. Journal of Personality and Social Psychology, 59, 1216-1229.
Goldberg, L.R. (1998). International Personality Item Pool: A scientific collaboratory for the
development of advanced measures of personality and other individual differences.
[on-line]. Available HTTP: http//ipip.ori.org/ipip/ipip.html.
Goldberg, L.R. (1999). A broad-bandwidth, public-domain, personality inventory measuring the
lower-level facets of several five-factor models. In I. Mervielde, I. Deary, F. De Fruyt, &
F. Ostendorf (Eds.), Personality Psychology in Europe, Vol. 7 (pp. 7-28). Tilburg,
The Netherlands: Tilburg University Press.
Guzzo,R.A., Yost, P.R., Campbell, R.J., and Shea, G.P. (1993). Potency in groups: Articulating a
construct. British Journal of Social Psychology, 3, 87-106.
Hofstee, W.K., de Raad, B., & Goldberg, L.R. (1992). Integration of the Big Five and
circumplex to trait structure. Journal of Personality & Amp social Psychology, 63, 146-
163.
Hogan, R., Curphy, G.J., & Hogan, J. (1994). What we know about leadership: Effectiveness and
personality. American Psychologist, 49, 493-504.
House, R.J. (1977). A 1976 theory of charismatic leadership. In J.G. Hunt & L.L. Larson (Eds.),
Leadership: The cutting edge (pp.189-207). Carbondale, IL: Southern Illinois University
Press.
Howell, J.M., & Avolio, B.J. (1993). Transformational leadership, transactional leadership, locus
of control, and support for innovation: Key predictors of consolidated business unit
performance. Journal of Applied Psychology, 78, 891-902.
James, L.R., Demaree, R.G., and Wolf, G. (1984). Estimating within-group interrater reliability
with and without response bias. Journal of Applied Psychology, 69, 85-98.
Janis, I.L., & Mann, L. (1977). Decision making: A psychological analysis of conflict, choice,
and commitment. New York: Free Press.
Judge, T.A., & Bono, J.E. (2000). Five Factor Model of Personality and Transformational
Leadership. Journal of Applied Psychology, 85 (5), 751-765.
Judge, T.A., Bono, J.E., Ilies, R., and Gerhardt, M.W. (2002). Personality and leadership: A
qualitative and quantitative review. Journal of Applied Psychology, 87, 765-780.
Katz, D., Maccoby, N., & Morse, N.C. (1950). Productivity, supervision, and morale in an office
situation. Part 1. England: Oxford.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
648

Kirkpatrick, S.A. & Locke, E.A. (1996). Direct and indirect effects of three core charismatic
leadership components on performance and attitudes. Journal of Applied Psychology, 81,
36-51.
Klein, K.J., Dansereau, F., & Hall, R.J. (1994). Levels issues in theory development, data
collection, and analysis. Academy of Management Review, 19, 195-229.
Koh, W.L., Steers, R.M., & Terborg, J.R. (1995). The effects of transformational leadership on
teacher attitudes and student performance in Singapore. Journal of Organizational
Behavior, 16, 319-333.
Kozlowski, S. W. J., Chao, G. T., & Morrison, R. F. (1998). Games raters play: Politics,
strategies, and impression management in performance appraisal. In J. W. Smither (Ed.),
Performance appraisal: State of the art in practice (pp. 163-205). San Francisco, CA:
Jossey-Bass.
Kozlowski, S. W. J., Gully, S. M., McHugh, P. P., Salas, E., & Cannon-Bowers, J. A. (1996). A
dynamic theory of leadership and team effectiveness: Developmental and task contingent
leader roles. Research in Personnel and Human Resource Management, 14, 253-
305.
Kozlowski, S.W., & Hattrup, K. (1992). A disagreement about within-group agreement:
Disentangling issues of consistency versus consensus. Journal of Applied Psychology, 77,
161-167.
Kozlowski, S.W.J., & Klein, K.J. (2000). A multilevel approach to theory and research in
organizations: Contextual, temporal, and emergent processes. In K.J. Klein & S.W.
Kozlowski (Eds.), Multilevel theory, research, and methods in organizations:
Foundations, extensions, and new directions (pp. 3-90). San Francisco:
Jossey-Bass.
Landy, F. J., & Farr, J. L. (1980). A process model of performance rating. Psychological
Bulletin, 87, 72-107.
Liden, R.C., Wayne, S.J., Judge, T.A., Sparrowe, R.T., Kraimer, M.I., & Franz, T.M. (1999).
Management of poor performance: A comparison of manager, group member, and group
disciplinary decisions. Journal of Applied Psychology, 84, 835-850.
Lowe, K.B., Kroeck, K.G., and Sivasubramaniam, N. (1996). Effectiveness correlates of
transformational and transactional leadership: A meta-analytic review of MLQ literature.
Leadership Quarterly, 7, 385-425.
Marks, M.A., Sabella, M.J., Burke, C.S., & Zaccaro, S.J. (2002). The impact of cross-training on
team effectiveness. Journal of Applied Psychology, 87, 3-13.
McCrae, R.R., & Costa, P.T., Jr. (1987). Validation of the five-factor model of personality across
instruments and observers. Journal of Personality and Social Psychology, 52, 81-90.
McCrae, R.R., & Costa, P.T. (1991). Adding Liebe und Arbeit: the full five-factor model and
well-being. Personality & amp; Social Psychology Bulletin, 17, 227-232.
Murphy, K.R. (1996). Individual differences and behavior in organizations: Much more than g.
In K.R. Murphy (Ed.), Individual differences and behavior in organizations (pp. 3-30).
San Francisco: Jossey-Bass.
Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social,
organizational, and goal-based perspectives. Thousand Oaks, CA: Sage.
Niehoff, B.P., Enz, C.A., & Grover, R.A. (1990). The impact of top-management actions on
employees attitudes. Group & Organizational Management, 15, 337-352.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
649

Ostroff, C., & Schmitt, N. (1993). Configurations of organizational effectiveness and efficiency.
Academy of Management Journal, 36,1345-1361.
Pitman, B. (1993). The relationship between charismatic leadership behaviors and organizational
commitment among white-collar workers. Dissertation Abstracts International, 54, 1013.
Ployhart, R., Lim, B.C., & Chan, K.Y., (2001). Exploring relations between typical and
maximum performance ratings and the five factor model of personality. Personnel
Psychology, 54, 809-843.
Rousseau, D. (1985). Issues of level in organizational research: Multilevel-level and cross-level
perspectives. In L.L. Cummings and Barry M. Staw (Eds.), Research in Organizational
Behavior, 7, 1-37.
Sackett, P.R., Zedeck, S., & Fogli, L. (1988). Relations between measures of typical and
maximum job performance. Journal of Applied Psychology, 73, 482-486.
Shamir, B., House, R.J., & Arthur, M.B. (1993). Th motivational effects of charismatic
leadership: A self-concept based theory. Organizational Science, 4, 577-594.
Sivasubramaniam, N., Murry, W.D., Avolio, B.J., & Jung, D.I. (2002). A longitudinal model of
the effects of team leadership and group potency on group performance. Group &
Organization Management, 27, 66-96.
Sosik, J.J., Avolio, B.J., Kahai, S.S., & Jung, D.I. (1998). Computer-supported work groups
potency and effectiveness: The role of transformational leadership, anonymity and task
interdependence. Computer in Human Behavior, 14, 491-511.
Sparks, J.R., & Schenck, J.A. (2001). Explaining the effects of transformational leadership: An
investigation of the effects of higher-order motives in multilevel marketing organizations.
Journal of Organizational Behavior, 22, 849-869.
Steiger, J.H. (1980). Tests for comparing elements of a correlation matrix. Psychological
Bulletin, 87, 245-251.
Stogdill, R.M. (1974). Handbook of leadership: A survey of theory and research. New York.
Stogdill, R.M. & Coons, A.E. (1957). Leader behavior: Its description and measurement.
England: Oxford.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal
variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural Equation Modeling:
Concepts, Issues, and Applications (pp. 56-75). Thousand Oaks: Sage.

Wilkinson, L., & the Task Force on Statistical Inference. (1999). Statistical methods in

psychology journals. American Psychologist, 54, 594-604


Williams, E.J. (1959). The comparison of regression variables. Journal of the Royal Statistical
Society (Series B), 21, 396-399.
Yammarino, F.J., Dansereau, F., & Kennedy, C.J., (2001). A Multiple-level multidimensional
approach to leadership: viewing leadership through an elephant’s eye. Organizational
Dynamics, 29, 149-163.
Yukl, G. (1999). An evaluation of conceptual weaknesses in transformational and charismatic
leadership theories. Leadership Quarterly, 10, 285-305.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
650

Author Note
We would like to thank the Associate Editor and two anonymous reviewers for their help
and suggestions. We also appreciate the help and support of the Singapore Ministry of Defense.
The opinions expressed in this paper are those of the authors and do not necessarily reflect the
views of the Singapore Ministry of Defense.
Correspondence concerning this article should be sent to Beng-Chong Lim, Applied
Behavioral Sciences Dept, 5 Depot Road, #16-01, Tower B, Defense Technology Towers,
Singapore, 109681; email: lim_b_c@yahoo.com.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
651

Table 1: Responses from 10 Subject Matter Experts on the Performance Content Survey

Response scale:
1 To no extent
2 To a limited extent
3 To some extent
4 To a considerable extent
5 To a great extent
6 Perfectly

To what extent: M SD
1. …does the Assessment Center reflect the knowledge, skills, abilities, and
tasks acquired during the team training? 5.3 .48

2. …does team performance (rated at the end of the team training phase) reflect
the knowledge, skills, abilities, and tasks required during the team training? 5.0 .47

3.…is the content of the Assessment Center similar to the content of the team
training? 5.3 .67

4.…does the team performance rated in the team training phase tap the same
performance domain as the team performance rated in the Assessment Center? 5.2 .63

5.…do the ratings from the Assessment Center and the ratings from the team
training phase tap the same dimensions of performance? 5.1 .88

6.…are the team training objectives similar to the performance criteria used in
the Assessment Center? 4.7 .67

7.…are the team training objectives similar to the performance ratings used in
the team training phase? 5.3 .48

8.…are the team tasks (e.g., quick attack) performed in the Assessment Center
similar to the team tasks learned in team training? 5.5 .53

9.…. are the behaviors (e.g., fire and movement) exhibited in the Assessment
Center similar to the behaviors exhibited during team training? 5.3 .48

10.Yes or no: Does the Assessment Center measure the same content, tasks, YES-10
knowledge, skills, abilities, and other characteristics as the team training? NO- 0

Note: n = 10 . These experts do not use the term “performance ratings in typical contexts,” rather
in the language of this organization such ratings would be known as “team performance rated in
the team training phase.” We therefore used the language familiar to these experts to refer to the
performance measures across both contexts.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
652

Table 2: Leader and Team Descriptive Statistics for All Measures

Personality
Measures M SD 1 2 3 4 5 6 7 8

Personality
1. Extroversion 2.97 .70 -
2. Conscientiousness 3.55 .58 .15 -
3. Neuroticism 2.97 .65 -.63* -.04 -
4. Openness To Experience 3.35 .62 .42* .45* -.32* -
5. Agreeableness 3.84 .62 .22* .21* -.04 .58* -

6. Transformational Leadership 2.35 .55 .31* -.09 -.39* -.08 -.29* -

7. Maximum-Like Team Performance 3.87 1.04 .11 -.05 -.13 -.31* -.21* .60* -

8. Typical-Like Team Performance 3.52 .46 .50* .18 -.56* .37* .28* .32* .18 -

Note N = 39 leaders and teams. * p < .10 (one tailed).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
653

Table 3: Regression for FFM on Transformational Leadership

Transformational Leadership
Variables β Overall R2

Extroversion .24

Neuroticism -.29*

Conscientiousness -.04

Openness to Experience -.07

Agreeableness -.30*

.28**

Note. N = 39 leaders and teams. * p < .10 (one tailed); **p < .05 (one tailed).

******************************************************

Figure 1: Proposed relationships among the variables. The dashed line indicates a posited
weak relationship.

Team Performance
in Typical
Contexts
Five Factor Model
of Personality Transformational
Leadership
Team Performance
in Maximum
Contexts

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
654

ARMY LEADERSHIP COMPETENCIES: OLD WINE IN NEW


BOTTLES?
Brian Cronin, M.A., Ray Morath, Ph.D., and Jason Smith, M.A.
Caliber Associates
10530 Rosehaven St., Suite 400
Fairfax, VA
croninb@calib.com

This paper will compare the current Army leadership competency model (FM 22-
100) published in 1999, which is used as a guideline for Army leader development, to
previous versions of FM 22-100 used by the Army. The purpose of the project is to
identify the attributes and competencies that were prevalent in the past and remain
prevalent today as well as to identify new competencies that have emerged over time.
Specifically, do attributes or competencies such as duty, cultural awareness, and
conceptual skills make unique and notable contributions to our understanding of new
leader requirements above and beyond those of earlier Army leadership models?

HISTORY OF LEADERSHIP MODELS


Colonel Paparone (2001) explains that the development of military staff
organization and procedure models can be traced back 2000 B.C. beginning with the
armies of early Egypt. Although, James D. Hittle, a historian of military staff, states that
the modern military staff model did not emerge until the late 1800’s (Paparone, 2001).
Hittle proposes that the modern staff system has certain distinguishable features:
ƒ A regular education system for training staff officers
ƒ Delegation of authority from the commander
ƒ Supervised execution of orders issued by or through the staff
ƒ A set method of procedure by which each part performs specific duties.

These aspects of the modern models guide leaders in their duties and provide consistency
to the larger organization.
Although modern military staff models had emerged in Europe in the 1800’s, the
United States did not have a published modern US Army Staff doctrine until after World
War I when, in 1924, the US published its first document providing leaders with formal
requirements. This document was entitled ‘Field Service Regulation (FSR)’ (Paparone,
2001). However, FSR lacked specific detail and therefore, could not provide the guidance
that soldiers needed in the field.
To alleviate this situation, the Staff Officers’ Field Manual was introduced in
1932. This manual provided significantly more information to Army leaders and was a
modest success. It was an improvement over FSR because it provided leaders with
principles, data, and decision-making tools to guide their operation of units and
commands during peace and war rather than a simple set of rules that were to be blindly
followed (Paparone, 2001). However, this manual also fell short because it could not
adapt to the Army expansion that preceded World War II.
As the Army began to expand for World War II, the scale and complexity
of military planning and decision-making became increasingly more intricate, thus Army
doctrine was forced to expand. The goal of this doctrine expansion was to create a

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
655

comprehensive guide that the Army could use to develop and guide their leaders across
situations. In 1940, an expanded guide was published. Entitled: The US Army Field
Manual (FM) 101-5, Staff Officers’ Field Manual: The Staff and Combat Orders, this
document increased the scope and depth of the Army’s doctrine proportionately beyond
the 1932 version and allowed the Army to focus on more specific aspects of officer
training. In 1948, the Army published its first manual focusing specifically on leadership,
DA Pam 22-1. While this manual was only a pamphlet, the notion of having one guide to
develop leadership within the Army was a concept that was quickly embraced. Since DA
Pam 22-1’s creation, the Army has updated this field manual numerous times and has
continued to use it as a building block for all subsequent leader development manuals.
The current paper investigates the similarities and differences of these iterations over
time.

FM 22-100 COMPARISONS
The Army’s field manual has undergone nine iterations since its formation in
1948 (FM 22-1, 1951; FM 22-10, 1953; FM 22-100, 1955; FM 22-100, 1958; FM 22-
100, 1961; FM 22-100, 1965; FM 22-100, 1973; FM 22-100, 1983; FM 22-100, 1990).
As Major Smidt (1998) comments, “From a humble start as a 1948 pamphlet titled
Leadership, the doctrine has evolved into a comprehensive electronic treatise published
on the World Wide Web. However, the question to be answered is whether or not the
content of this document has changed significantly and the evolutions in information
presentation have aided leader comprehension of the material.
In general, the rest of this paper will describe the identification of past and present
leader competencies that were identified by the Army Leadership manuals over time and
our attempts to develop a crosswalk of these competencies—highlighting the emergence
and expiration of Army leader competencies, including the documentation of
competencies that have remained critical over time (even if their labels have been
changed). To accomplish these goals, the current paper will use three versions of FM 22-
100 as exemplars of the evolution of this document, FM 22-100, 1958; FM 22-100, 1973;
and FM 22-100, 1999.

FM 22-100 METHOD COMPARISON


The Army Leadership, Field Manual 22-100 has evolved over time and has used
various methods for deriving and presenting leadership guidance. Early versions of the
manual relied on the experience of military leaders such as General Omar N. Bradley
(FM 22-100, 1948) and General J. Lawton Collins (FM 22-100, 1951) to record their
insights and experiences in order to teach other Army Leaders. Their opinions combined
with pieces of supporting behavioral science research such as Maslow’s hierarchy of
needs (Maslow, 1954) were used as the foundation of these early documents.
The manual received its first considerable increase in content in 1953. This was
the first publication of the Manual under the FM 22-100 title, which is still used today,
and it was the first attempt at a comprehensive guide for Army leaders. Throughout the
1950’s, the manual was continually updated (1953, 1955, 1958). The 1958 version, the
last manual of that decade, highlights the vast improvements of the document since its
early days as a humble pamphlet.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
656

The principles and techniques presented in the 1958 version were “the result of an
analysis of outstanding leadership displayed by both military and civilian leaders (FM 22-
100, 1953).” The result of this analysis was a description of 14 traits and 11 principles,
which provided the attributes a leader must have to succeed. This model was thorough,
easily understood, and remained a fixture in Army leadership for the next 32 years until
1990. Between 1958 and 1990, there were several updates to the manual (1961, 1965,
1973, and 1983). Each of the ‘improvements’ provided more or less detail to leaders
(e.g., depending upon the trait or principle being presented) than the previous version
regarding how to develop themselves and their subordinates. For example, the 1958
version included--in addition to operational descriptions/definitions of each trait--lists of
activities and methods leaders could use to develop these traits. The 1973 version failed
to include these lists of methods/activities, yet it offered situational studies to leaders to
provide assistance in relating the material in the manual to the day-to-day issues that a
leader might face in the field. Meanwhile, both versions offered ‘Indications of
Leadership’ as benchmarks leaders could use to determine whether or not they were
successful in their roles.
The 1990 manual, however, illustrates the significant departure from the
trait/principle approach of earlier versions. The 1990 version of FM 22-100 used a factor
analysis of leadership survey responses to establish the following nine leadership
competencies: communications, supervision, teaching and counseling, Soldier team
development, technical and tactical proficiency, decision making, planning, use of
available systems, and professional ethics, which in concept has many similarities to
other versions but was presented in a different format.
The most recent version (1999) of FM 22-100 used quite a different approach to
establish a framework of leadership than any of the earlier versions. This version
presents 39 labels that specify what a leader of character and competence must be, know,
and do. Within this framework are “be” dimensions consisting of values (7), attributes
(3) and sub-attributes(13); “know” dimensions consisting of skills (4); and “do”
dimensions consisting of actions (3) and sub-actions (9). Among the approaches used to
derive these labels was borrowing from the notions of a best selling job-hunting book
(Bolles, 1992) that identified people, things, and ideas as critical to job success. This
framework transposed these issues into interpersonal, technical, and conceptual skill
areas and added an additional skill labeled ‘tactical’ to lend an Army flavor to the list.
This version also differed from the 1958 and 1973 models in that it no longer included
lists of activities or methods that leaders could use to develop their leadership skills and
abilities.
In summary, the Army Leadership doctrine has used a variety of methods to
derive very different types of leadership frameworks. While there may be no single
correct method for establishing Army leadership requirements, there are several
important considerations when attempting to develop a framework or model that
prescribes those requirements. Among these are methodological rigor in development,
comprehensiveness of the framework/model, consistency of the dimensions within the
framework/model, ability to communicate the model/framework to the audience, and
endurance of the framework/model over time.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
657

FRAMEWORK COMPARISON: 1958; 1973; 1999


The comparison of the 1958, 1973, and 1999 manuals was very intriguing. Our
research team created a crosswalk that compared the leadership requirements proposed
by each of the three documents. The most noticeable factor that emerged from this
exercise was that the 1958 version and 1973 versions were highly similar in both their
structure and straightforward (i.e., handbook-like) presentation of their models.
Meanwhile the 1999 version offered a very different, if not more complex and
encompassing, framework. One differentiating factor between the 1999 version was that
it presented leader requirements in terms of values, attributes, skills, actions (some of
these dimensions also included sub-dimensions) that were organized in the Be-Know-Do
model, while the earlier versions presented leader requirements in somewhat more
economical terms of traits and principles. In addition, the 1999 version provided 33 total
requirements (24 values, attributes, and skills and 9 actions) while the 1958 and 1973
offered 25 requirements (14 traits and 11 principles).
While the 1999 version certainly differed from past models, our team noticed that
there was considerable overlap between the old and updated version. For instance, our
crosswalk of the three manuals indicated that 29 of the 33 requirements that were
presented in the 1999 model were directly addressed in the earlier versions. Only 9 of the
33 requirements were found to have one-to-one correspondence in their labels across
versions (i.e., same label was used (loyalty)). Twenty of the 1999 requirements were
mapped to earlier requirements with different labels but the same or highly similar
definitions/descriptions of the requirement. For example the value labeled “duty” from
the 1999 version was linked to the trait labeled “dependability” from the 1958 and 1973
versions because the definitions/descriptions were highly similar:
Duty (1999): Duty begins with everything required of you by law, regulation, and
orders; but it includes much more than that. Professionals do their work not just to
the minimum standard, but to the very best of their ability.
Dependability (1973): The certainty of proper performance of duty; To carry out
any activity with willing effort; To continually put forth one’s best effort in an
attempt to achieve the highest standards of performance and to subordinate
personal interests to military requirements.
Dependability (1958): The certainty of proper performance of duty. A constant
and continuous effort to give the best a leader has in him. Duty demands the
sacrifice of personal interests in favor of military demands, rules and regulations,
orders and procedures, and the welfare of subordinates.

Thus, our review suggests that although the terminology and labels for particular
requirements changed over time, the actual content of those leader requirements remained
relatively stable.
Our review also revealed that the definitions/descriptions from the 1958 and 1973
versions often appeared to be more straightforward and concrete than those of the 1999
version. The earlier frameworks typically described the particular trait, in more explicit
terms than the 1999 model, how it differed from other similar traits, how it was
manifest/demonstrated in a task or activity, and how it could be developed. Meanwhile,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
658

the requirements in the 1999 model were often described in a more implicit manner. It
used more general terms that could be applied across civilian and military settings and
appeared to be directed at a more advanced audience of readers in terms of their general
knowledge of behavior and cognition. This version spent less emphasis defining the
operational parameters of the requirement (i.e., what it was and was not) and more
emphasis describing its importance to leadership.
Other requirements from the 1999 version such as: Honor, Will, Self-Discipline,
Self-control, and Balance, whose labels did not directly correspond with those of earlier
versions were more than adequately linked to the definitions/descriptions of requirements
of previous manuals. Requirements from the 1999 model that were not linked to
requirements from earlier models included Self-Confidence, Intelligence, Cultural
Awareness, and Health Fitness. These new requirements may represent vital additions in
the face of the new missions that the Army faces in today’s world. Additionally, one
requirements from the earlier models was not linked to any of the 33 requirements of the
1999 model. The trait, Enthusiasm, was not directly addressed in the requirement
definitions of the 1999 version. It is possible that the trait of Enthusiasm may no longer
be required or it may be that the authors misread or misinterpreted the requirements
definitions of the 1999 model whereby particular phrases of these definitions may have
inferred some characteristic of enthusiasm (or like characteristic) on the part of the
leader.

DISCUSSION
This review has provided a unique understanding of the development of the FM
22-100. Each version has built on the previous and provided more information to help
leaders grow and succeed. This commitment to improvement on the part of the Army has
resulted in the 1999 version that has more detail than previous models. As General Patch
(1999) indicates, “the (1999) manual takes a qualitative step forward by:
ƒ Thoroughly discussing character-based leadership.
ƒ Clarifying values.
ƒ Establishing attributes as part of character.
ƒ Focusing on improving people and organizations for the long term.
ƒ Outlining three levels of leadership – direct, organizational and strategic
ƒ Identifying four skill domains that apply at all levels.
ƒ Specifying leadership actions for each level.

… (Further), more than 60 vignettes and stories illustrate historical and contemporary
examples of leaders who made a difference. The (1999) manual captures many of our
shared experiences, ideas gleaned from combat, training, mentoring, scholarship and
personal reflection.”
Our analysis led the authors to conclude that the content in each of the manuals
was similar but that the 1999 version implements a completely new framework for
presenting this material. It was found that the vast majority of leader requirements and
the competencies underlying those requirements have remained stable over time even
though labels for these requirements and the complexity of leader requirements models
have evolved over time.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
659

Our review of the evolution of the leader requirements models found that the
earlier models were practical and straightforward in format and had the look and feel of
handbooks. While these models provided clear descriptions of the requirements they also
had fewer numbers of requirements. These models were practical and utilitarian and their
strengths lay in their parsimony and explicit descriptions, but they were relatively limited
in terms their theoretical underpinnings. For example, all competency requirements
related to the individual leader (values, characteristics, abilities, skills) were labeled
under the single category heading of traits even though some of these requirements were
clearly not traits (e.g., knowledge) as typically defined by behavioral science.
The newer 1999 version was found to be more specific in terms of the various
levels or strata of leader requirement dimensions within the framework. This version was
also more sophisticated in its specification of model components and subcomponents and
their interrelationships with one another—thus providing greater opportunities for testing
and validation of the model and its components. It attempted to disentangle the single
category of leader traits into more appropriate categories of values, attributes, and skills
and described the differences in these categories of requirements. This model also
replaced the Leadership Principles found in earlier models with actions and sub-actions
that support the performance of these Principles and described how values, attributes,
skills, and actions are maintained within the Be-Know-Do framework.
However, this most recent leadership framework is not without its shortcomings. The
1999 manual was of considerably greater length (almost twice as many pages as the 1973
version) and complexity than previous versions. This version also appeared to be less
precise in terms of helping the leader identify particular activities to develop their
leadership skills, due possibly to its greater focus upon the specification of the various
dimensions and categories of leadership, Due largely to these factors, the 1999 version
may be more difficult for leaders (junior leaders in particular) to quickly grasp, and as a
result, may be less easily applied by Army leaders. With these issues in mind, the authors
of future iterations of FM 22-100 may wish to evaluate both the strengths and weaknesses
of recent evolutions and determine if there are ways to present, describe, and advance
complex leadership models without sacrificing practicality, parsimony and ease of
comprehension in the audience of future leaders.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
660

References

Bolles, R.N. (1992). What color is your parachute? Ten Speed Press: Berkeley,
California.

Fitton, R. A. (1993). Development of Strategic Level Leaders. The Industrial


College of the Armed Forces, Fort McNair, Washington, D.C.

FM 22-10 Leadership (March 1951)

FM 22-100 Military Leadership (August 1999)

FM 22-100 Military Leadership (July 1990)

FM 22-100 Military Leadership (June 1973)

FM 22-100 Military Leadership (November 1965)

FM 22-100 Military Leadership (June 1961)

FM 22-100 Military Leadership (December 1958)

FM 22-100 Command and Leadership for the Small Unit Leader (February 1953)

Maslow, A. H. (1954). Motivation and personality. New York: Harper & Row.

Paparone, C. R. (1991). 45 US Army Decision-making: Past, Present and Future.


Military Review, Fort Leavenworth, Kansas.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
661

DEVELOPING APPROPRIATE METRICS FOR PROCESS AND


OUTCOME MEASURES

Amy K. Holtzman, David P. Baker, and Robert F. Calderón


American Institutes for Research
1000 Thomas Jefferson St., NW
Washington, DC, 20007-3835, USA
aholtzman@air.org
dbaker@air.org

Kimberly Smith-Jentsch
University of Central Florida
4000 Central Florida Blvd.
P.O. Box 161390
Orlando, FL, 32816-1390, USA
kjentsch@mail.ucf.edu

Paul Radtke
NAVAIR Orlando TSD
12350 Research Parkway
Orlando, FL, 32826-3275, USA
paul.radtke@navy.mil

INTRODUCTION

Scenario-based training is a systematic process of linking all aspects of scenario


design, development, implementation, and analysis (Oser, Cannon-Bowers, Salas, &
Dwyer, 1999). An exercise or scenario serves as the curriculum and provides trainees the
opportunity to learn and practice skills. In the military, scenario-based training exercises
are often used to evaluate whether individuals or teams have attained the necessary skills
for specific missions and can apply them in real-world situations. To determine if the
objectives of training have been met, performance measures can be used to assess
individual and team performance within a given scenario.

Performance measures can vary by the level of analysis, as performance can be


measured at the individual, team, and multi-team level. They can also vary by the type of
measures (i.e., outcomes or processes) and the overall purpose of the training.
Performance outcomes are the results of an individual or team’s performance, whereas
process measures “describe the steps, strategies, or procedures used to accomplish a task”
(Smith-Jentsch, Johnston, & Payne, 1998, p.62). Examples of purposes of training
include diagnosing root causes of performance problems, providing feedback, or
evaluating levels of proficiency or readiness for a task.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
662

Much of the research conducted on performance measurement has been done in


the civilian performance appraisal arena, in which a supervisor evaluates a subordinate’s
performance on the job. The focus of the early research was to assess ways to improve
instruments used to evaluate performance (Arvey & Murphy, 1998; Bretz, Milkovich, &
Read, 1992; Landy & Farr, 1980). Most of this research targeted rating scales, which are
numerical or descriptive judgments of how well a task was performed. Research has been
conducted on graphic rating scales, on behaviorally anchored ratings scales (BARS), on
behavioral summary scales (BSS), and on the strengths and weaknesses of each (Cascio,
1991; Murphy & Cleveland, 1991; Borman, Hough, & Dunnette, 1976). However, the
performance appraisal research lacks studies that compare multiple rating formats. The
civilian team performance and training literature has also addressed checklist and
frequency count formats. Checklists consist of items or actions that have dichotomous
answers such as yes/no, right/wrong, or performed action versus failed to perform action,
whereas frequency counts provide an indication of the number of times that a behavior,
action, or error occurs.

However, the literature on scenario-based training lacks research on measurement


methods, on the type of data that can be collected from each, and on how measurement
purpose influences measurement method. In sum, the civilian research on rating formats
has not been conducted in the scenario-based training area, nor has it been translated into
the military arena.

Military instructors have primarily used the checklist format, which is rarely used
in the civilian sector. The reason for this difference may be that the criteria for
evaluating performance in the military may be better defined than are the criteria for
civilian jobs. That is, in the military, successful performance may be more amenable to
yes/no judgments. Furthermore, little civilian or military research has been conducted on
other rating formats, such as distance and discrepancies (D&D), which are numerical
indices of how actual performance on the task differs from optimum performance.
Moreover, after 1980, when Landy and Farr declared that further research on rating
formats was not needed, little additional research addressed this topic at all. Thus, when
to use a certain format for evaluating scenario-based training in the military and what
factors drive that decision are necessary topics of research.

To address this need, we conducted a study to provide guidance on identifying


and developing appropriate metrics for measuring human performance in military
settings. To gather information about how best to measure processes and outcomes, we
conducted brief interviews with ten experts in human performance measurement. The
literature identified a number of outcome and process measures, but we selected the ones
most relevant to scenario-based training in the military, as this domain was the focus of
the study. We used accuracy, timeliness, productivity, efficiency, safety, and effects as
our outcome measures and procedural and non-procedural taskwork and teamwork as
process measures (See Table 1 for definitions and examples.)

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
663

METHOD

Participants

Participants were ten experts with extensive experience in human performance


measurement and training. Many also had experience working for the Navy or other
military branches. All had PhDs in various areas of Psychology, with the majority having
a Ph.D. in Industrial/Organizational Psychology.

Participants’ collective experience in the human performance measurement arena


included test development and assessment, performance model development, job
analysis, and performance appraisal measure development. Their collective training
experience included developing training evaluation measures, using assessment centers
for development, developing training programs, and facilitating training. In addition,
several had evaluated training programs and developed competency models for the Navy.

Measures

The four rating formats included checklist, frequency, distance and discrepancy,
and rating scale. Participants were given definitions and examples of processes and
outcomes and told to rank their first, second, and third choice of format for measuring
each process and outcome (Interview guide is available from the author.) If they felt
other formats were necessary, they were instructed to add them and explain their reasons
for doing so. In addition, participants explained their rationale for choosing their first
choices for each process and outcome. Finally, they provided demographic information
on themselves.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
664

Table 1. Definitions and Examples of Outcomes and Processes

Item Definition Example


Outcomes
Accuracy Precision with which a task is Identifying whether a bomb
performed hit the target
Timeliness Length of time in which actions Assessing how long it took
are performed the damage control team to
extinguish a fire
Productivity Rate at which actions are Examining the number of
performed or tasks are planes that were launched or
accomplished within a given refueled during a particular
situation mission
Efficiency Ratio of resources required to Examining the amount of
those expended to accomplish a fuel that was burned on
given task deck compared to the
amount planned
Safety Degree to which a task is Number of injuries per
accomplished in a way that does month
not unduly jeopardize human and
capital resources
Effects Degree to which the desired Keeping enemy air forces
effect was achieved grounded
Processes
Procedural Taskwork Requirements specific to a Completing the detect-to-
position that follow a step-by- engage sequence
step process
Non-procedural Requirements specific to a Developing a plan to clear
Taskwork position that do not follow a mines from the Straits of
step-by-step process Hormuz
Teamwork Processes individuals use to Information exchange,
coordinate activities communication, supporting
behavior, initiative, and
leadership

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
665

RESULTS

Results are broken out below by type of measure. Techniques were chosen based on the
definition and examples of the measures listed above.

Outcome Measures

Accuracy

As Table 2 demonstrates, the majority of participants chose distance and


discrepancy as their first choice for measuring accuracy. The main reason participants
gave for this choice was that it allows for greater precision than do the other techniques.
Participants felt D&D allows evaluators to determine how far from optimum the
individual has performed and compare the outcome with the specified goal; furthermore,
it is also the most direct measure of ratio level data.

A few individuals chose rating scale as their first choice because, in their opinion,
it allows for finely tuned judgments. This format was “the most flexible tool and can be
written to accomplish the same objectives as the other techniques,” according to one
individual. Their judgment was that rating scales can be easier to use than the other
techniques and can be effectively used to assess low-base rate outcomes, such as hitting
targets. This format was also judged to be useful when the criteria are individually
determined and circumstance-specific. In fact, one individual chose rating scales for all
process and outcome measures for these reasons.

One individual felt that the frequency format was best for measuring accuracy
because it effectively measures the percentage of hits, which may be more important than
how close the misses were. D&D measures how close or far off the misses were. As
shown in Table 3, the most popular second choices for accuracy include the checklist,
frequency, and rating formats.

Timeliness

Table 2 reveals that D&D is the most popular method for measuring timeliness
because participants felt that D&D allows for precise measurement and can best measure
how late a person is. Furthermore, they felt that D&D allows for valid and reliable
measurement and is the most appropriate measure for ratio-level data.

Other techniques that participants chose include frequency and rating formats.
Participants’ comments were that frequencies could be used if no comparison is needed
and time requirements are the only necessary information. On the other hand, according
to participants, rating formats allow for measuring the extent to which the action is
perceived as timely.

As depicted in Table 3, rating scales and checklist formats were the top second
choices for measuring timeliness, whereas checklist was the most popular third choice.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
666

Productivity

According to Table 2, almost everyone felt that a frequency format would most
effectively measure productivity, because frequency allows one to count specific actions.
If the number of occurrences of an activity is important, participants felt frequency was
the best technique to use. D&D and rating formats were also chosen as possible
techniques. Participants said these techniques would be appropriate when measuring
productivity in comparison to a standard.

As shown in Table 3, over half of participants chose checklist as a second choice,


and another one-third chose the rating scale format second. D&D was a popular third
choice.

Efficiency

Table 2 illustrates the most common first choice for measuring efficiency: D&D.
Participants felt this technique allowed for the most direct measure of efficiency,
including ratio measures in comparison to a specified goal. Frequency and rating format
techniques were also chosen. Participants’ commented that frequency allows for
counting the number of resources expended, whereas the rating format allows for more
flexibility and judgment.

Table 3 demonstrates that well over half chose the rating scale format as their
second choice. Finally, frequency and checklist formats were the most common third
choices.

Safety

Table 2 reveals that frequency and rating scales were the top first choices for
measuring safety. Part of the choice may depend on how safety is measured. According
to several participants, frequency allowed for concrete information to be gathered on the
number of observable occurrences, such as the number of accidents. On the other hand,
rating scales allowed for scoring unsafe behaviors that may be precursors to accidents
and determined the extent to which a goal is met. D&D could also be used, according to
participants.

Table 3 shows that checklist was the second choice for most participants. Finally,
rating scale was third for several participants.

Effects

Table 2 shows that slightly over half of participants chose rating scales as the best
way to measure effects because they can measure the extent to which a goal was met.
The checklist technique was also a popular first choice because it showed whether the
effect was achieved and allowed for a yes or no format.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
667

As depicted in Table 3, half of participants chose D&D as their second choice,


whereas frequency was a common third choice. Rating scale was a less common third
Distance and
Outcomes Checklist Frequency Discrepancy Rating Scale
Accuracy 10% 10% 60% 20%
Timeliness - 11% 67% 22%
Productivity - 80% 10% 10%
Efficiency - 10% 70% 20%
Safety - 50% 10% 40%
Effects 33% - 11% 56%
choice.
Table 2. First Choice Rating Formats for Outcome Measures

Table 3. Second Choice Rating Formats for Outcome Measures


Distance and
Outcomes Checklist Frequency Discrepancy Rating Scale
Accuracy 45% 22% 11% 22%
Timeliness 30% 10% 20% 40%
Productivity 56% - 11% 33%
Efficiency 13% 13% 13% 61%
Safety 62% 25% 13% -
Effects 12% 25% 50% 13%

Process Measures

Procedural Taskwork

As highlighted in Table 4, nearly every respondent chose checklist as the most


appropriate way to measure procedural taskwork because it follows a step-by-step
process. Participants felt that the steps lend themselves to a checklist format, which
allows for actions to be measured by using dichotomous variables, such as yes or no. In
addition, they felt that it allows for determining whether or not multiple outcomes were
accomplished. Paper and pencil job knowledge tests and hands-on performance tests
were also recommended for both procedural and non-procedural taskwork, given that
these tests effectively measure the expertise and skill sets of individuals.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
668

Table 5 shows that nearly half of participants chose the rating scale format as their
second choice, whereas few chose it as their third choice. In addition, using a hands-on
performance task was a second choice for measuring all three processes.

Non-procedural Taskwork

Table 4 demonstrated that the majority of participants felt that rating scales most
effectively measure non-procedural taskwork. Rating scales “with appropriate
benchmarks permit reliable measurement of complex tasks,” according to one individual.
Participants felt that this format allows for subjective judgments and offers flexibility in
assessing various aspects of the task under scrutiny.

A few chose the checklist format as their first choice because it allows for
measuring whether or not a task was accomplished. The checklist format was a more
popular second choice, as one-half of participants chose it for second place, whereas
rating format was second for only one-quarter of participants (Refer to Table 5). The
frequency format was the most popular third choice.

Teamwork

Not surprisingly, rating scale was also the most popular choice for measuring
teamwork, as evidenced in Table 4. Participants’ opinions were that rating scales allow
for measuring subjective items and variance in the responses. According to one
individual, rating scales were “the best method for capturing complex behavior associated
with teamwork.” Several felt that this approach allowed for greater flexibility.

According to participants, if the teamwork components were countable behaviors,


frequency would be a suitable method. The most common answers for second and third
choices were the rating scale, checklist, and frequency formats (Refer to Table 5 for
second choice). Peer evaluations and performance tasks were also recommended as
second choices for measuring teamwork.

Table 4. First Choice Rating Formats for Process Measures


Job
Distance and Knowledge
Processes Checklist Frequency Discrepancy Rating Scale Tests
Procedural
Taskwork 80% - - 10% 10%
Non-
procedural
Taskwork 22% - - 67% 10%
Teamwork - 10% - 90% -

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
669

Table 5. Second Choice Rating Formats for Process Measures


Distance
and Rating Performance Peer
Processes Checklist Frequency Discrepancy Scale Tasks Evaluations
Procedural
Taskwork 11% 33% - 45% 11% -
Non-
procedural
taskwork 50% 13% - 25% 12% -
Teamwork 38% 13% 13% 12% 12% 12%

CONCLUSION

In summary, we found that D&D was the preferred choice for many outcome
measures because D&D allowed the rater to measure performance against a standard.
For example, D&D allowed the rater to evaluate a trainee’s timeliness or efficiency. On
the other hand, safety and effects processes lent themselves more frequently to frequency
counts and checklists, as they are often measured in terms of the number of actions
performed.

For measuring the step-by-step components of procedural taskwork, the checklist


format was judged to be most suitable. However, according to participants, non-
procedural taskwork and teamwork can best be measured using rating scales, given that
the activities can vary.

This study was a beginning step in providing guidance on linking process and outcome
measures with appropriate rating formats. It is important to note that these results were
based on data from ten individuals. Further research is needed to validate these findings.
To collect more data on choosing an appropriate measurement method and determining
how the choice may vary based on training purposes, an additional independent survey
with measurement experts is planned. Based on these results, we will map particular
rating formats to particular measures and develop the business rules, or the guidelines, for
the automated tool for military instructors. This tool will help military instructors
identify, develop, and assess specific measures of human performance during scenario-
based training.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
670

REFERENCES

Arvey, R. D., & Murphy, K. R. (1998). Performance evaluation in work settings. Annual
Review of Psychology, 49, 141-168.

Borman, W. C., Hough, L. M., & Dunnette, M. D. (1976). Development of behaviorally


based rating scales for evaluating U.S. Navy Recruiters. (Technical Report TR-
76-31). San Diego, CA: Navy Personnel Research and Development Center.

Bretz, R. D., Jr., Milkovich, G. T., & Read, W. (1992). The current state of performance
appraisal research and practice: Concerns, directions, and implications. Journal
of Management, 18(2), 321-352.

Cascio, W. F. (1991). Applied psychology in personnel management (4th ed.).


Englewood Cliffs, NJ: Prentice Hall.

Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72-
107.

Murphy, K., & Cleveland, J. (1995). Understanding performance appraisal: Social,


organizational, and goal-based perspectives. Thousand Oaks, CA: Sage.

Oser, R. L., Cannon-Bowers, J. A., Salas, E., & Dwyer, D. J. (1999). Enhancing human
performance in technology-rich environments: Guidelines for scenario-based
training. In E. Salas (Ed.), Human/technology interaction in complex systems
(Vol. 9; pp. 175-202). Stanford, CT: JAI Press.

Smith-Jentsch, K. A., Johnston, J. H., & Payne, S. C. (1998). Measuring team-related


expertise in complex environments. In J.A. Cannon-Bowers & E. Salas (Eds.),
Making decisions under stress: Implications for individual and team training (pp.
61-87). Washington, DC: American Psychological Association.

AUTHOR NOTES

Amy K. Holtzman, American Institutes for Research; David P. Baker, American


Institutes for Research; Robert F. Calderón, American Institutes for Research (now at
Caliber Associates, Inc.); Kimberly Smith-Jentsch, University of Central Florida; Paul
Radtke, NAVAIR Orlando.
Funding for this research project was provided by the Navy (contract N61339-02-
C-0016). The opinions in this article are those expressed by the authors and are not
necessarily representative of Navy policies.
Correspondence concerning this article should be addressed to Amy K. Holtzman,
American Institutes for Research, 1000 Thomas Jefferson St., NW, Washington, DC
20007-3835. E-mail: aholtzman@air.org.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
671

SOFTWARE SUPPORT OF HUMAN PERFORMANCE ANALYSIS

Ian Douglas
Learning Systems Institute, Florida State University,
320A, 2000 Levy Avenue Innovation Park, Tallahassee, Florida, 32310, USA.
idouglas@lsi.fsu.edu

INTRODUCTION

This paper will briefly describe the outcomes of the object-oriented performance analysis
(AOOPA) project, which is a three year research project carried out in collaboration with
the army training information systems directorate and the coast guard human
performance technology center. The project has two main goals: to develop a framework
for optimal methods of front-end analysis to precede the development of human
performance support systems (Douglas and Schaffer, 2002) and to develop a model for a
new generation of software tools to support the framework. A framework is a set of
guidelines for creating efficient methodologies; a methodology is a more detailed process
prescription. There are two key foundations for the framework. Firstly, that everything
should be driven by an understanding of performance within an organizational system. It
should not be driven by solutions. Secondly, that the output of performance analysis
should be digitized in the form of standard packages of analysis knowledge that can be
shared and reused.

In addition, the following principles are recommended within the framework:

Visual modeling
Collaborative analysis that includes end-users
Rationale management
Automated support for analysis

It is important to stress that the framework is not tied to any particular solution type. It
should not be interpreted as needs analysis for training or any other solution type.
Although training has been the dominant solution by which organizations seek to enhance
the performance of their personnel, the knowledge and skill requirements for operations
are expanding and changing at such a rate that other solutions are required for attaining
optimal performance. Automation, process re-engineering, providing jobs aids, and just-
in time learning are among many solution types that can be blended into performance
support systems. Encouraging consideration of creative, non-traditional solutions is also
important in performance improvement; a perfect example of this was the use of playing
cards in Iraq to facilitate facial recognition. The OOPA framework is founded on general
systems theory (Weinberg, 2001). It also incorporates the common analytic approach of
stepwise refinement from a general problem domain to more specific components of the
problem.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
672

An important part of the framework is that it incorporates the need for greater cost-
efficiency in the development of performance support solutions. It does this in two ways;
firstly, by identifying measurable performance goals during analysis against which
improvements brought about by different performance support mechanisms can be
measured. In this regard it adopts the lessons from studies in the field of human
performance technology (Gilbert, 1996, Robinson and Robinson, 1995, Rosset, 1999).
Secondly, it incorporates the growing trend towards encouraging and facilitating reuse
and sharing of digital assets. The view associated with this trend has so far been confined
predominantly to learning content (Douglas, 2001, Gibbons et al 2000). In the military
this trend is associated with the sharable content reference model (SCORM) from the
Advanced Distributed Learning (ADL) initiative (see www.adlnet.org). In the
framework, we extend reuse thinking to the reuse of problem analysis knowledge.

Figure 1: Model for enterprise software system based on performance analysis and
evaluation

By successfully defining a technology-supported framework for reusable performance


analysis knowledge, solutions to organizational performance problems or new
performance requirements will be specified more clearly and open to wider scrutiny via
the internet.

The AOOPA framework is one part of a more general framework that incorporates
analysis, change intervention and evaluation. The AOOPA software prototype is one part
of a model for an enterprise information technology system to support the more general
framework (see figure 1). In the enterprise IT system, analysis and evaluation knowledge,
and support solutions are organized into digitized components and shared among web-
enabled communities of stakeholders. Performance analysis sets the baseline in
determining what roles exist in an organization, what goals they must achieve and how
the achievement of those goals can be measured. The packaging of this knowledge into
digital components that can be accessed online will help reduce the replication of effort
that can occur when disparate groups look at similar problems at different times and are
unaware of existing knowledge. Part of the reason for this situation is that there are no
centralized stores of such knowledge and it is usually communicated in the form of large
integrated documents in non-standard formats.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
673

SOFTWARE SUPPORT

A working model (proof-of-concept prototype) for a new generation of software tools to


support the performance analysis framework has been constructed. This can be accessed
at the web site of knowledge communities’ research group,
http://www.lpg.fsu.edu/OOPA/. The prototype is entirely web-based and incorporates all
of the features of the framework outlined in the introduction. An important concept
embedded in the design of the prototype is configurability, i.e. tools should be not fixed
to a particular methodology, but be adaptable to the specific methodologies (and
terminology) used in different organizations and groups. The intention is to create a set of
configurable tools and methods, which have a shared underlying representation of
performance analysis knowledge. The system architecture is based on the emerging new
paradigm of service-oriented systems (Yao, Lin and Mathieu, 2003). Service-oriented
systems are constructed from web-based software components which each offer a distinct
service, they differ from traditional systems that package a number of services into an
integrated application. The model enables custom front-ends to be created to a
continuously refined shared repository of knowledge. Each version of the AOOPA
system will have core component categories (see figure 2), but the specific version of
each component will vary from organization to organization. In the current version a third
party collaboration tool called Collabra has used for the collaboration component. If a
different organization used a different collaboration tool this would be ‘plugged in’ in
place of Collabra. Likewise, if different data types were collected in another
organizations methodology (or different terminology used), different data entry templates
could appear. The user support component can be tailored to the specific methodology
employed by an organization.

Analysis System Specific


Visual Modelling
Performance Data Entry
Rationale Management Performance
User Support Data Analysis
Interfaces
Handling Project
Databases
Collaboration
User Management
Search

Enterprise Gatekeeper

Enterprise-Wide
Reusable Reusable
Analysis Solution
Repository Repository

Figure 2: Architecture for the performance analysis support software

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
674

Figure 3: Screen shot from current version of the model showing performance case
modeling

The components of an analysis (models, data and rationale) are stored in a project
specific database from which analysts and stakeholders can retrieve, view and comment
on the content. Some organizations may wish to have a gatekeeper function to filter
quality analysis controlled components into a central repository. An integral part of the
tool is an automated search of this repository. Thus, as soon as an analysis team on a new
project begins to enter data, it is matched against existing data in the analysis repository
to alert the user to possible sources of existing knowledge.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
675

Figure 3, illustrates the prototype that has been constructed to demonstrate one version
conforming to the framework and the architecture illustrated in figure 2. The modeling
component is a key focal point and provides a shared reference and navigation model
throughout a project. The current prototype uses performance case modeling, which is an
adaptation from unified modeling language (UML) use case notation (Cockburn, 1997).
UML is widely used in object-oriented software systems analysis and has been adapted
for more general systems analysis (Marshall, 2000). Performance case notation provides
a simple, end-user understandable means of defining a problem space. A performance
diagram is a graphic that illustrates what performers do on the job and how they interact

with other performers to reach performance goals. A role is a function that someone has
as part of an organizational process (e.g., mission commander, radio operator, vehicle
inspector). A primary role is the focus of the project. Secondary roles, someone who
interacts with the primary role, may be included when looking at team performance. The
primary role is likely to achieve several performance goals, e.g. a mission commander
would have to successfully plan, brief, execute and conduct an after action review. High
level performance goals decompose into lower level diagrams containing sub-goals.
Performance goals represent desired performance at an individual level.

Figure 4: Screen shot from current version of the model showing gap analysis for one
performance case

Facilitated by the groupware component an analysis team works collaboratively to create


and edit the performance diagram. The analysis team will use the diagram to develop a
shared understanding of a domain and identify performance cases where there is a gap
between desired and current on-the-job performance. It allows the organization to
pinpoint a specific performance discrepancy that could be costing time, money, and other
resources. Those performance cases will be subject to a more detailed analysis.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
676

There are a variety of data collection templates that could be attached to the performance
case to assist in detailed analysis. The current version of the AOOPA model uses a gap
analysis template (see figure 4) in which data is collected about current and desired
performance in the tasks that are carried out in pursuit of a performance goal. Where a
gap is found, for example if 100% accuracy is required on a task and only 60% of those
assigned to the task are able to achieve this, then a cause and solution analysis will be
initiated. In a cause analysis, stakeholders review gap data, brainstorm possible causes,
put them into cause categories, rate them by user-defined criteria, and select which ones
to pursue. The AOOPA prototype allows users to categorize causes so the recommended
solutions are more likely to address the underlying causes. The specific process used in
this version is described in more detail in Douglas et al, 2003.

Organization X
X
Version
Process X
Version
of Analysis
data
Process
Independen
t
Organization
Y Y
Process Version Analysis
data
d l
Figure 5: Transferring data between different organizations

FUTURE WORK

An important concept embedded in the design of the prototype is configurability


(Cameron, 2002). As noted in the introduction, a framework is meant to provide a
structure for a variety of approaches that can be tailored to specific groups or situations
rather than to provide a set of rules for a single correct way of developing systems. The
philosophy is that no “one size fits all” methodology will be effective; methodologies
evolve to fit organizations, situations, and new technologies. The same is true for
software tools, which are of limited use when fixed on a particular methodology. Given
that different organizations will adopt different methods to suit different circumstances,
software support should be adaptable. The vision is of a system of configurable tools and
methods, which have a shared underlying representation of performance analysis
knowledge. This will allow custom interfaces to a continuously refined shared repository
of knowledge on human performance (see figure 5).

The software architecture used allows the plug-in of different components, thus allowing
a different set of components to be configured to each methodology that conforms to the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
677

framework. Having completed the version of AOOPA toolset described in the previous
section a second version is being constructed based on the methodology being used by
the coast guard. The two versions will be used to begin the development and testing of
mechanisms that enable the exchange of performance analysis data across different
organizations. If such mechanisms prove feasible, they have the potential not only to
reduce the replication of effort that currently occurs across service and unit boundaries,
but also to greatly increase performance levels. Performance analysis may develop into a
continuous improvement process throughout the military rather than a discrete activity
prior to new systems development.

REFERENCES

Cameron, J. (2002). Configurable Development. Communications of the ACM. Process,


45 (3), pp. 72-77.

Cockburn. A. (1997). Structuring use cases with goals. Journal of Object Oriented
Programming, 10 (7): pp. 35–40.

Chung, J.C., Lin, K.J., and Mathieu R.G. (2003). Web Services Computing: Advancing
Software Interoperability. IEEE Computer, October, 36 (10). pp. 35-37.

Douglas, I., Nowicki, C., Butler, J. and Schaffer S., (2003). Web-Based Collaborative
Analysis, Reuse and Sharing of Human Performance Knowledge. To appear in the
proceedings of the Inter-service/Industry Training, Simulation and Education Conference
(I/ITSEC). Orlando, Florida, Dec.

Douglas, I. and Schaffer, S. , (2002). Object-oriented performance improvement.


Performance Improvement Quarterly. 15 (3), pp. 81-93.

Douglas, I. (2001). “Instructional design based on reusable learning objects: applying


lessons of object-oriented software engineering to learning systems design”. Proceedings
of the IEEE Frontiers in Education conference. F4E, pp. 1-5, Reno, Nevada, October.

Gibbons, A. S., Nelson, J. & Richards, R. (2000). The nature and origin of instructional
objects. In D. A. Wiley (Ed.), The Instructional Use of Learning Objects: Online Version.
Retrieved from the World Wide Web: http://reusability.org/read/chapters/gibbons.doc

Gilbert, T. (1996). Human competence: Engineering worthy performance. Amherst,


MA: HRD Press, Inc.

Marshall, C. (2000). Enterprise modeling with UML. Reading, Mass: Addison Wesley.

Robinson, D. and Robinson J.C., (1995). Performance Consulting: Moving Beyond


Training. San Francisco: Berrett-Koehler.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
678

Rossett, A. (1999). First Things Fast: A Handbook for Performance Analysis. San
Francisco: Jossey-Bass Pfeiffer.

Weinberg, G. (2001). An introduction to general systems thinking. New York: Dorset


House.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
679

HOW MILITARY RESEARCH CAN IMPROVE TEAM TRAINING


EFFECTIVENESS IN OTHER HIGH-RISK INDUSTRIES

Jeffrey M. Beaubien, Ph.D.


Senior Research Scientist
American Institutes for Research
1000 Thomas Jefferson Street, NW
Washington, DC 20007-3835
jbeaubien@air.org

David P. Baker, Ph.D.


Principal Research Scientist
American Institutes for Research
1000 Thomas Jefferson Street, NW
Washington, DC 20007-3835
dbaker@air.org

Amy K. Holtzman, M.A.


Research Associate
American Institutes for Research
1000 Thomas Jefferson Street, NW
Washington, DC 20007-3835
aholtzman@air.org

INTRODUCTION

For over 30 years, military-sponsored research has advanced the state-of-the-science by


defining the essential components of teamwork (Salas, Bowers, & Cannon-Bowers, 1995);
developing theoretical models of team dynamics (Salas, Dickinson, Converse, & Tannenbaum,
1992); measuring team inputs, processes, and outputs (Cannon-Bowers, Tannenbaum, Salas, &
Volpe, 1995); and developing training programs to improve team performance (Smith-Jentsch,
Zeisig, Acton, & McPherson, 1998). Although similar lines of research have been undertaken in
other high-risk industries – such as aviation and healthcare – researchers in these domains have
rarely built upon the lessons learned from military team training research to any significant
degree (Salas, Rhodenizer, & Bowers, 2000).
The primary purpose of this paper is to illustrate how military-sponsored research can be
leveraged to advance the practice of team training in other high-risk industries. Specially, we
identify two areas – the specification of critical team knowledge, skill, and attitude
competencies, and the development of effective training strategies – that have the greatest
potential for transitioning military research findings to non-military settings. Finally, we
comment on several possible reasons as to why advancements in the military have not
transitioned, and provide suggestions for disseminating critical military research findings on
team performance.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
680

CRITICAL TEAMWORK COMPETENCIES

Team training refers to a set of instructional strategies that apply well-tested tools (e.g.,
simulation, lectures, behavioral models) to improve the knowledge, skills, and attitudes that are
required for effective team performance. Unfortunately, the published literature on teamwork
competencies contains numerous inconsistencies in both the competency labels and their
associated definitions. In this section, we describe recent efforts to clarify this body of research,
and the implications of this work for improving the team training effectiveness.
Team Knowledge Competencies
Team knowledge competencies are defined as facts, principles, and concepts that help
team members form appropriate interaction strategies, coordinate with one another, and achieve
maximum team performance. For example, to function effectively the team members must know
what team skills are required, when particular team behaviors are appropriate, and how these
skills should be utilized. The team members must also be familiar with the team’s mission, and
should understand one another’s roles in achieving that mission (Cannon-Bowers et al., 1995).
Team Skill Competencies
Team skill competencies are defined as the learned capacity to interact with one another
in pursuit of a common goal. Unlike knowledge competencies, which involve the mastery of
factual knowledge, team skill competencies involve the application of knowledge to perform
specific behaviors. Recent research suggests that team skill competencies can be classified into
eight major categories: adaptability, situation awareness, performance monitoring/feedback,
leadership, interpersonal relations, coordination, communication, and decision-making.
Moreover, several research studies have shown that these skills are directly related to team
performance (cf. Salas et al., 1995).
Team Attitude Competencies
Team attitude competencies are defined as internal states that influence the team
members’ decisions to act in a particular way. Previous research suggests that team attitudes can
have a significant effect on how teamwork skills are actually put into practice. For example,
Driskell and Salas (1992) reported that collectively-oriented individuals performed significantly
better than individually-oriented team members, because collectively-oriented individuals tended
to take advantage of the benefits offered by teamwork.
Factors That Influence Team Competency Requirements
Tannenbaum and his colleagues suggests that team performance cannot be understood
independently of the team’s organizational, work, and task environment (Tannenbaum, Beard, &
Salas, 1992). The authors define “organizational characteristics” – such as reward systems,
policies, supervisory control, and resources – as features that define the task and, by extension,
the competencies that are required to perform that task. The authors define “work
characteristics” as structural and normative variables – such as formal rank or leadership
hierarchies, and the extent to which team members are geographically dispersed – that determine
how tasks are assigned and shared by various team members. Finally, the authors define “task

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
681

characteristics” – such as task complexity, task organization, and task type – as factors that
determine the extent to which coordination is necessary for successful team performance.
Building on Tannenbaum et and colleagues’ work, Cannon-Bowers and her colleagues
(1995) developed a 2x2 typology of team training requirements. Quadrant I depicts teams that
perform a relatively stable set of tasks with a relatively stable set of teammates. These teams are
hypothesized to require team-specific and task-specific competencies – such as task organization,
mutual performance monitoring, and shared problem-model development – that are “context-
driven.” Examples of teams that require context-driven competencies include combat teams and
sports teams. Quadrant II depicts teams whose tasks vary considerably over time, but these tasks
are performed with a relatively stable set of teammates. These teams are proposed to require
team-specific and task-generic competencies – such as conflict resolution, motivating others, and
information exchange – that are “team-contingent.” Examples of teams that require team-
contingent competencies include self-managing work teams, management teams, and quality
circles. Quadrant III depicts teams that perform a stable set of tasks with sets of individuals that
vary. These teams are expected to require task-specific and team-generic competencies – such as
task structuring, mission analysis, and mutual performance monitoring – that are “task-
contingent.” Examples of teams that require task-contingent competencies include medical
teams, aircrews, and some fire fighting teams. Finally, quadrant IV depicts teams that perform
tasks that vary over time with team members who also vary. These teams are predicted to
require team-generic and task-generic competencies – such as morale building, consulting with
others, and assertiveness – that are “transportable.” Examples of teams that require transportable
competencies include task forces, project action teams, and project teams. Practitioners can use
this typology to help them identify the most important team competencies for their particular
team type.

The Measurement of Team Competencies

Many researchers have found it difficult to measure more than four distinct competencies
at a time, for example during scenario-based training. Smith-Jentsch and her colleagues (1998)
identified four teamwork skill competencies that could be reliably and accurately measured
during Navy combat-information-center (CIC) team training scenarios: information exchange,
supporting behavior, team feedback skill, and flexibility. Information exchange was defined as
passing relevant data to team members who need it, before they need it, and ensuring that sent
messages are understood as intended. Supporting behavior is defined as offering and requesting
assistance in an effective manner both inside and outside of the team. Team feedback skill is
defined as communicating one’s observations, concerns, suggestions, and requests clearly
without becoming hostile or defensive. Finally, flexibility was defined as adapting team
performance strategies quickly and appropriately to changing task demands.

Conclusions

As our discussion demonstrates, military-sponsored research has led the way in defining
team competencies, specifying the core training requirements for various team types, and
assessing team competencies during simulation-based training. Sadly, the aviation and

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
682

healthcare domains are still plagued by inconsistent terminology and definitions for important
team competencies, and have made substantially less progress in measuring team competencies
during training. In the next section, we identify recent advances in training strategies that have
the potential to improve team training effectiveness in other high-risk industries.

TRAINING STRATEGIES

The military has led the way in developing effective strategies for training team
competencies. The watershed for much of this research was the accidental shoot down of an
Iranian Airbus by the USS Vincennes in the Persian Gulf in 1988. In response to the incident,
the Navy began a multi-year, multi-million dollar research program to identify effective team
training interventions. The program, called Tactical Decision Making Under Stress (TADMUS),
began in 1990 and led to numerous breakthroughs in the science and practice of team training,
such as the development of cross-training, mental model training, and team self-correction
training. Following the Navy’s lead, the U.S. Air Force and U.S. Army also supported applied
research into team training during the 1990s. Both programs led to improved team training in
these two branches of the military (cf. Spiker, Silverman, Tourville, & Nullmeyer, 1998). In the
sections that follow, we describe some of the accomplishments in team training that have the
greatest potential for application in other high-risk industries.

Simulator-Based Training

Simulators have been used widely to train teams in the military, aviation, and most
recently, healthcare. Simulator-based training is based on the logic that the fidelity of the
training environment is essential to ensure the transfer of trained skills. Training-environment
fidelity is comprised of stimulus fidelity (i.e., trainees experience the same “behavioral trigger”
that they will experience on the job); response fidelity (i.e., trainees perform the same behaviors
that they will perform on the job); and equipment fidelity (i.e., trainees use the same materials
and equipment that they will use on the job) (Salas et al., 1992).
Even though there have been tremendous advances in the extent to which simulations can
reproduce realistic conditions of a team’s environment, military research has demonstrated that a
realistic simulation by itself is not a panacea for ensuring effective team training. Other factors,
in particular the design of the training, are equally if not more important than simulator fidelity.
For example, Oser and colleagues define scenario-based training is a systematic process of
linking all aspects of scenario design, development, implementation, and analysis (Oser,
Cannon-Bowers, Salas, & Dwyer, 1999). Scenario-based training involves a six-step process:
(1) reviewing skill inventories and/or historical performance data; (2) developing learning
objectives and competencies; (3) selecting scenario events; (4) identifying performance measures
and standards; (5) diagnosing performance strengths and weaknesses; and (6) delivering
feedback to the trainees. Scenario-based training differs from traditional classroom training in
that a scenario or exercise serves as the curriculum with the overall goal of providing specific
opportunities for trainees to develop critical competencies through practice and feedback.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
683

Team-Coordination Training

Another technique widely used for military team training is team-coordination training
(TCT). TCT concentrates on teaching team members about the basic processes underlying
teamwork. It typically targets several team competencies needed for successful performance in a
particular environment. TCT is usually delivered through a combination of lecture,
demonstration (e.g., video examples), and practice-based methods (e.g., role plays) over two to
five days. Research supports its effectiveness in terms of positive reactions, enhanced learning,
and behavioral change. Similar to simulator-based training, TCT has been widely applied in
aviation and has recently been introduce in healthcare. In aviation, TCT is referred to as Crew
Resource Management (CRM) training (Salas, Fowlkes, Stout, Milanovich, & Prince, 1999).

Team Self-Correction Training

The last three training methods noted here – self-correction training, cross-training, and
stress exposure training – are strategies that were developed from the TADMUS project and have
been applied in the military but have not been utilized in other high risk industries. Team self-
correction is the naturally occurring tendency for effective teams to debrief themselves by
reviewing their past performance, identifying and diagnosing errors, discussing remedial
strategies, and planning for the future. Self-correction training is delivered through a
combination of lecture, demonstration, practice, and feedback. Team members learn to observe
their performance, to categorize their effective and ineffective behavior into a structured format,
and to use this information to give each other feedback (Cannon-Bowers & Salas, 1998). When
guided by a competent instructor, this method of team training has been demonstrated to improve
team performance.

Cross-Training

Cross-training exposes team members to the basic tasks, duties, and responsibilities of the
positions held by other members of the team; the purpose is to promote coordination,
communication and team performance. Ideally, this training alleviates the decline in
performance that is likely to follow personnel changes; it also increases implicit coordination
(i.e., being able to coordinate without the need to communicate explicitly). The training
comprises sharing cross-role information (teammates, task, equipment, situation); enhancing
team members’ understanding of interdependencies, roles and responsibilities; and providing
cross-role practice and feedback. Research has demonstrated that, compared their counterparts
who were not cross-trained, cross-trained teams better anticipate the information needs of their
teammates, commit fewer errors, and exhibit more effective teamwork behaviors (Cannon-
Bowers & Salas, 1998).

Stress Exposure Training

Stress can exert a significant negative influence on an individual or a team’s ability to


perform effectively, especially in high-stress environments that are characterized by ambiguous

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
684

situations and severe time pressure (e.g., military operational environment, medical emergency
departments). Stress exposure training (SET) reduces stress through a three-phase program
designed to provide trainees with information, skills training, and practice. SET improves
performance by providing team members with experience in the stressful environment, thereby
helping them learn what to expect. Practice takes place under graduated exposure to stressors.
Documented outcomes of SET include reduced anxiety in stressful situations, increased
confidence, and improved cognitive and psychomotor performance under stress (Driskell &
Johnson, 1998).

Conclusions

As our discussion demonstrates, the military has led the way in developing a set of tools,
methods, and content that focuses on enhancing teamwork. In aviation and healthcare some of
these strategies have been adapted, or a more accurate characterization is that aviation and
healthcare have developed their own similar approaches. With that in mind, we now turn to the
cases of aviation and healthcare and briefly examine current practices in each of these industries
and then highlight several areas where we see great opportunities to transition findings from
military research.

CASE STUDIES

Aviation

Current Practices. For over thirty years, team performance has been a central focus of
commercial aircrew training. This training, which is known as Crew Resource Management
(CRM) training, initially focused on changing pilot attitudes through in-class lectures,
demonstrations, and discussion among aircrew members (Helmreich, Merritt, & Wilhelm, 2000).
Over the years, CRM training has evolved into its current form today under the Federal Aviation
Administration’s (FAA) Advanced Qualification Program (AQP). Unlike traditional pilot
training under CFR 14 Part 121, AQP integrates CRM principles with technical skills training
through the entire training curriculum. Team training under AQP primarily relies on two
strategies, team coordination training (TCT) and simulator-based training. In fact under AQP,
aircrews are actually evaluated on their CRM and technical skills in the simulator during an end-
of-training Line Operational Evaluation (LOE) which is used to certify their airworthiness
(Federal Aviation Administration, 1990).
Possible Transitions. Unlike other high-risk industries, aviation has been one of the
leaders in attempting to understand and enhance team performance (Helmreich et al., 2000).
However, as we have noted throughout, these efforts have occurred in a vacuum and have
“reinvented the wheel” by not transitioning findings and lessons learned from the military.
Based on the current status of military research, we believe that although the science of team
performance is advanced in aviation, aviation could benefit significantly by leveraging lower-
cost, team training strategies from the military, to reduce aviation’s over reliance on expensive
simulator training. Granted when training advanced technical skills, high fidelity simulations
will be required, however similar levels of fidelity are not required for training CRM skills.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
685

Numerous lessons learned have come out of the TADMUS research for example (e.g., team self-
correction training, cross-training) that could directly transition and should yield improved team
performance on the part of aircrews at significantly reduced costs.

Healthcare

Current Practice. It is only within the last few years that the healthcare industry has
placed a significant emphasis on the relationship between teamwork and patient safety. This
new focus was caused by the publication of To Err is Human, a detailed treatise on the
unacceptable levels of system failures within healthcare (Kohn, Corrigan, & Donaldson, 1999).
Since the publication of To Err is Human, several team training interventions have been
introduced. For example, MedTeamsTM (Morey, Simon, Jay, Wears, et al., 2002) – a lecture and
discussion-based curriculum – and Anesthesia Crisis Resource Management (ACRM; Gaba,
Howard, Fish, Smith, & Sowb, 2001) – a simulator based curriculum – have been implemented
in a number of private, public, and military hospitals.
Possible Transitions. We believe, relative to aviation, that there are significantly more
opportunities to transition military findings to healthcare, because of the early stage of
development of medical team training. However, and quite interesting from the standpoint of the
discussion here, the healthcare domain has not developed their existing approaches in a vacuum
but rather looked to aviation and not the military for guidance. Although this has led to some
important transitions from aviation, we believe that the more relevant and most useful
information resides in accomplishments made by the military. Specifically, we believe that
healthcare could benefit greatly by examining the work of Cannon-Bowers et al. (1995) on how
team knowledge, skill, and attitude requirements vary by task and team characteristics. Such
research should be used as a basis when identifying medical team competency requirements and
how competency requirements might vary by medical specialty. Second, we believe that many
of the training strategies developed under the TADMUS program could be directly transitioned
to healthcare. Particularly, team self-correction training because of its reliance on team members
to observe, assess, and debrief their own performance. This strategy seems like it would be of
great benefit in medicine where time and cost constraints require practical approaches to
addressing teamwork.

IMPEDIMENTS AND RECOMMENDATIONS

With the continuous reduction in Federal funding for basic and applied research, we
believe that now more than ever it is important for multiple industries to coordinate their efforts
to understand important human performance problems like error, safety, and team performance.
Typically, the impediments for joint efforts are the unique contextual factors that are
characteristic of the teams under investigation; military teams differ from aircrews and medical
teams. However, we believe that teams in high-risk environments likely have more
characteristics in common than not. For example, in all cases the consequences of error are great
and time pressure and high workload are likely.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
686

To offset the traditional stovepipes of research, we recommend joint industry workshops


on team training and performance. An annual conference would allow researchers to
disseminate findings and tools that could be directly transitioned into other industries – this
approach would also be far quicker than the typical publication process. Furthermore, such open
forums would promote the coordination of future research efforts. This approach would
maximize the use of available resources, which are extremely limited in today’s environment.
Finally, we recognize that numerous findings are in fact transitioned from the military and other
high-risk public and private industries. We simply believe that research on team performance is
particularly ripe for such transitions and that the science of teams and teamwork could do better
in promoting this approach.

REFERENCES

Cannon-Bowers, J.A., & Salas, E. (1998). Individual and team decision making under stress:
Theoretical underpinnings. In J.A. Cannon-Bowers & E. Salas (Eds.), Making decisions
under stress: Implications for individual and team training (pp. 17-38). Washington,
DC: American Psychological Association.
Cannon-Bowers, J.A., Tannenbaum, S.I., Salas, E., & Volpe, C.E. (1995). Defining
competencies and establishing team training requirements. In R.A. Guzzo, E. Salas, &
Associates (Eds.), Team effectiveness and decision-making in organizations (pp. 333-
380). San Francisco: Jossey-Bass.
Driskell, J.E., & Johnston, J.H. (1998). Stress exposure training. In J.A. Cannon-Bowers & E.
Salas (Eds.), Making decisions under stress: Implications for individual and team
training (pp. 191-217). Washington, DC: American Psychological Association.
Driskell, J. E., & Salas, E. (1992). Collective behavior and team performance. Human Factors,
34, 277-288.
Federal Aviation Administration. (1990). Line operational simulations: Line oriented flight
training, special purpose operational training, line oriented evaluation. Advisory
Circular 120-35B. Washington, DC: Author.
Gaba, D.M., Howard, S.K., Fish, K.J., Smith, B.E., & Sowb, Y.A. (2001). Simulation-based
training in anesthesia crisis resource management (ACRM): A decade of experience.
Simulation & Gaming, 32, 175-193.
Helmreich, R.L., Merritt, A.C., & Wilhelm, J.A. (2000). The evolution of crew resource
management training in commercial aviation. International Journal of Aviation
Psychology, 9,19-32.
Keesling, W., Ford, P., & Harrison, K. (1994). Application of the principles of training in armor
and mechanized infantry units. In R.F. Holz, J.H. Hiller, et al. (Eds.), Determinants of
effective unit performance: Research on measuring and managing unit training readiness
(pp. 137-178). Alexandria, VA: US Army Research Institute for the Behavioral & Social
Sciences.
Kohn, L.T., Corrigan J.M., & Donaldson, M.S. (1999). To err is human. Washington, DC:
National Academy Press.
Morey, J.C., Simon, R., Jay, G.D., Wears, R., Salisbury, M., Dukes, K.A., et al. (2002). Error
reduction and performance improvement in the emergency department through formal

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
687

teamwork training: Evaluation results of the MedTeams project. Health Services


Research, 37, 1553-1581.
Oser, R.L., Cannon-Bowers, J.A., Salas, E., & Dwyer, D.J. (1999). Enhancing human
performance in technology-rich environments: Guidelines for scenario-based training. In
E. Salas, (Ed.), Human/technology interaction in complex systems (Vol. 9, pp. 175-202).
Stamford, CT: JAI Press.
Salas, E., Bowers, C.A., & Cannon-Bowers, J.A. (1995). Military team research: 10 years of
progress. Military Psychology, 7, 55-75.
Salas, E., Dickinson, T.L., Converse, S.A. & Tannenbaum, S.I. Toward an understanding of team
performance and training. In: R.W. Swezey, & E. Salas (Eds.), Teams: Their training and
performance (pp. 3-29). Norwood, NJ: Ablex.
Salas, E., Fowlkes, J.E., Stout, R.J., Milanovich, D.M., & Prince, C. (1999). Does CRM training
improve teamwork skills in the cockpit? Two evaluation studies. Human Factors, 41,
326-343.
Salas, E., Rhodenizer, L., & Bowers, C.A. (2000). The design and delivery of crew resource
management training: Exploiting available resources. Human Factors, 42, 490-511.
Smith-Jentsch, K.A., Zeisig, R.L., Acton, B., & McPherson, J.A. (1998). Team dimensional
training: A strategy for guided team self-correction. In J.A. Cannon-Bowers & E. Salas
(Eds.), Making decisions under stress: Implications for individual and team training (pp.
271-297). Washington, DC: American Psychological Association.
Spiker, V. A., Silverman, D. R., Tourville, S. J., & Nullmeyer, R. T. (1998). Tactical team
resource management effects on combat mission training performance (Report No.
USAF AMRL Technical Report AL-HR-TR-1997-0137). Brooks Air Force Base: U.S.
Air Force Systems/Materiel Command.
Tannenbaum, S. I., Beard, R. L., & Salas, E. (1992). Team building and its influence on team
effectiveness: An examination of conceptual and empirical developments. In K. Kelly
(Ed.), Issues, theory, and research in industrial/organizational psychology (pp. 117-153).
New York: Elsevier.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
688

DEVELOPING MEASURES OF HUMAN PERFORMANCE:


AN APPROACH AND INITIAL REACTIONS
Dana Milanovich Costar, David P. Baker, Amy Holtzman
American Institutes for Research
Kimberly A. Smith-Jentsch
University of Central Florida
Paul Radtke
NAVAIR Orlando TSD

Even with the tremendous emphasis on training throughout the U.S. Navy, the
development of reliable and valid performance rating tools for assessing trainee performance has
represented a significant challenge to Navy instructors. Such tasks are typically a collateral duty
and instructors often have no background in performance measurement techniques. As a result,
instructors tend to use measures that are familiar to them and easy to use (e.g., checklists) to
assess trainee performance, whereas studies appearing in the performance measurement and
training literatures have employed a wider variety of measurement methods. These include:
frequency counts (e.g., Goodman & Garber, 1988; Stout, Cannon-Bowers, Salas, & Milanovich,
1999), behavioral checklists (e.g., Fowlkes, Lane, Salas, Franz, & Oser, 1994; Salas, Fowlkes,
Stout, Milanovich, & Prince, 1999), distance/discrepancy scores (e.g., Josalyn & Hunt, 1998;
Smith-Jentsch, Johnston, & Payne, 1998), and rating scales (e.g., Hollenbeck, Ilgen, Tuttle, &
Sego, 1995; Marks, Zaccaro, & Mathieu, 2000).

To provide assistance in the area of individual and team performance measurement, a


training workshop was developed and delivered to Navy instructors, civil service employees, and
government contractors involved in Navy training. The overall objective of the workshop was to
demonstrate the process of identifying training objectives for measurement, selecting an
appropriate method to assess performance on that objective, and tailoring the measure with
operationally specific content. By attending the workshop, it was expected that participants
would: (1) be able to identify and craft good training objectives, (2) understand the importance
of collecting data on performance outcomes and processes, (3) understand the pros and cons
associated with various types of performance measurement methods, and (4) recognize and
develop effective performance measures.

The one-day workshop included both a morning and afternoon session. The morning
portion of the workshop consisted of a briefing on individual and team performance
measurement. The briefing provided an overview of performance measurement and then
presented a 7-step framework for developing reliable and valid measures of trainee performance.
The seven steps included: (1) consider level of analysis; (2) identify measurement objectives; (3)
clarify purpose for measuring performance; (4) decide whether you need to assess outcomes,
process, or both; (5) make sure objectives are measurable; (6) select a method for each process
and/or outcome; and (7) tailor measure with the appropriate content. In addition to discussing
each of the 7-steps, tips and guidelines related to each of the steps were presented and attendees

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
689

participated in informal class exercises to demonstrate the utility of the approach.

The afternoon session provided participants with hands-on practice using the 7-step
framework that had been presented in the morning session. Small groups of participants were
formed, with each group developing measures for a performance measurement objective of
interest. In the workshop invitation, participants were asked to bring their own objectives to
work on during the afternoon hands-on practice session. These objectives were used as the basis
for the afternoon session. Two facilitators from the research team were assigned to each group
to guide participants through the development of their measures.

WORKSHOP EVALUATION

Evaluation forms were developed to assess participant reactions to the morning briefing
and the afternoon hands-on practice. The morning evaluation form asked participants to rate the
performance measurement briefing on four criteria: Its ability to prepare them to (1) develop
good measurement objectives, (2) distinguish between outcomes and processes, (3) select an
appropriate measurement method, and (4) develop effective measures. Ratings were made via a
5-point Likert-type scale (5 = strongly agree, 1 = strongly disagree). Four open-ended questions
were then presented to assess whether participants’ expectations for the morning session had
been met, what they had found most useful about the briefing, what additional information
should be added to the briefing, and who could benefit from attending this type of briefing. A
similar evaluation form was developed for the afternoon session in that it required participants to
rate the hands-on practice on the same four criteria that were used to assess the morning briefing
and then presented two open-ended questions to assess what participants found most useful about
the hands-on practice session and how the afternoon session could be improved.

WORKSHOP 1

Morning Session Participants

Forty-four individuals attended the morning briefing on individual and team performance
measurement. Thirty-five of the morning participants (73%) completed the evaluation form. Of
those that completed the form, 11 were active-duty military personnel, 12 were civil service
employees, and 12 were contractors.

Reactions to Morning Session

Overall, participant ratings of the morning session were extremely positive. Eighty-six
percent of the attendees felt that the briefing had prepared them to develop good measurement
objectives. Seventy-nine percent reported that the morning helped them to distinguish outcomes
from processes. Eighty-six percent of participants indicated that the briefing had successfully
prepared them to select an appropriate measurement method and 80% reported that they felt
prepared to develop effective performance measures. These percentages are based on the
number of attendees who either agreed or strongly agreed with the objective statements.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
690

Furthermore, ratings were very positive regardless of whether participants were active-duty
military personnel, civil service employees, or government contractors.

In examining the responses to the open-ended questions related to the morning session,
almost all participants reported that the briefing had met their expectations because it had
provided a specific approach for developing reliable and valid measures of human performance
and presented a good overview of the various issues associated with developing measures. In
terms of the most useful part of the morning session, participants consistently identified the step-
by-step framework that was presented for developing measures of human performance.
Regarding suggestions for how the morning session could be improved, the majority of
participants reported that they would have liked more information about the relationship between
individual and team performance measurement and the use of the Naval Mission Essential Task
List (i.e., NMETL). Naval Mission Essential Tasks represent all tasks that have been identified
as necessary, indispensable, or critical to the success of a mission. Additionally, the associated
conditions and standards within which tasks must be performed are specified. While the content
of the current workshop was consistent with NMETLs, the focus was on performance
measurement rather than on NMETLs per se. Lastly, participants suggested that a host of
individuals could benefit from attending the workshop including supervisors and trainers.

Afternoon Session Participants

Twenty-two individuals attended the afternoon hands-on practice session. Eleven


individuals (50%) completed the evaluation form. Of these 11 individuals, 5 were active-duty
military personnel and 6 were contractors.

Reactions to Afternoon Session

Overall, ratings of the hands-on practice session were also positive. When evaluating the
afternoon session against our four criteria, the majority of participants agreed that the session
was effective in preparing them to develop good measurement objectives and to select an
appropriate measurement method. Approximately half of the respondents felt that the session
successfully prepared them to distinguish performance outcomes from processes and to develop
effective measures.

When responding to the two open-ended questions about the hands-on practice, the
majority of participants felt that the most useful parts of the afternoon session were (a) the
unique perspective brought to the table by each of the different participants and (b) the
discussion within the groups. Regarding how the afternoon session could be improved,
participants suggested that it might be beneficial to present attendees with a standardized task to
develop measures for rather than allowing participants to choose their own.

Workshop Revisions

Based on the feedback obtained by participants, the hands-on practice session was

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
691

slightly modified for the second workshop. Specifically, we pre-selected an objective (i.e.,
Positively Identify Friendly Forces) from the Naval Mission Essential Task List to serve as the
basis for measurement development. The objective that we selected consisted of providing the
means, procedures, and equipment to positively identify friendly forces and distinguish them
from unknown, neutral, or enemy forces. This task included positively distinguishing friendly
from enemy forces through various methods that may include procedural, visual, electronic, and
acoustic, in addition to providing information to the force commander to aid in the identification
of unknown contacts. It was anticipated that groups of participants would develop performance
measures for the pre-selected objective at different levels of analysis (e.g., individual, team).

WORKSHOP 2

Morning Session Participants

Forty-nine participants attended the morning briefing on individual and team


performance measurement. Thirty-one participants (63%) completed an evaluation form. Of
those that completed the form, 21 were active-duty military personnel, 4 were civil service
employees, and 6 were contractors.

Reactions to Morning Session

Consistent with the first workshop, participants were asked to rate the morning briefing
against our four criteria using the 5-point Likert-type scale. Seventy-six percent of participants
felt that the briefing had prepared them to develop good measurement objectives. Sixty-six
percent felt that they were better able to distinguish outcomes from processes as a result of the
briefing. Sixty-nine percent of the respondents indicated that the morning had prepared them to
select an appropriate measurement method. Forty-three percent felt that the morning session had
prepared them to develop effective performance measures, but an equal number were neutral on
the subject. This result is most likely attributable to the fact that a number of participants
recognized that learning how to accomplish this task successfully would require practice.

In examining the responses to the four open-ended questions on the morning evaluation,
attendees felt that their expectations had been met in that “[the session] provided clear,
understandable methods for developing measures of performance” and “helped to develop a
calculated way to measure human performance.” In terms of the most beneficial part of the
morning session, participants cited the performance measurement steps, the background
information about human performance, and the examples and discussion. When asked how the
briefing could be improved, participants indicated that they would have liked more information
on the history of the Universal Task List, NMETLs, and how they drive training and readiness
assessments. Although there could have been greater discussion about the NMETL process, it
was assumed that participants already had some level of familiarity with NMETLs prior to
attending the current workshop. Finally, attendees reported that supervisors, individuals that
observe and assess performance, trainers, and anyone working with NMETLs could benefit from
attending this type of briefing.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
692

Afternoon Session Participants

Although 24 participants attended the afternoon session, only 13 individuals completed


the evaluation form (54%). Of those that completed the evaluation, 9 were active-duty military
personnel and 4 were contractors.

Reactions to Afternoon Session

Although the ratings for the afternoon session were slightly lower than for the morning
session, the data were still positive. When evaluating the hands-on practice session against our
four criteria, slightly less than two-thirds of participants felt that the afternoon session had
prepared them to distinguish outcomes versus processes, while about half felt that it was helpful
in developing good measurement objectives, selecting an appropriate measurement method, and
developing effective measures.

In responding to the two open-ended questions about the hands-on practice, participants
indicated that the most useful part of the afternoon was gaining actual experience developing
measures. In addition, participants felt that the prepared examples were useful in understanding
and working through the process. In general, participants provided few suggestions for how the
afternoon session could be improved. Those that did provide input mentioned that greater detail
could have been provided on some of the handouts (e.g., more information related to
performance conditions and measures).

SUMMARY AND IMPLICATIONS

In summary, a workshop was developed on human performance measurement and


delivered on two separate occasions to personnel involved in Navy training. Overall, reactions to
these workshops were positive. Our analysis of participants’ reactions indicated that the
workshop: (1) met its stated objectives; (2) provided participants with useful information about
human performance measurement; (3) provided participants with a 7-step process for developing
reliable and valid measures of human performance; and (4) provided participants with experience
developing a performance measure.

Several important lessons have been learned as a result of the workshops conducted.
First, the discussions that took place during the workshops and the comments that were provided
on the evaluation forms indicated that there is a great deal of interest in human performance
measurement. Second, the individuals that attended the workshops appeared to vary greatly in
their knowledge about performance measurement – some knew very little while others knew a
good deal more. Third, participants liked the step-by-step framework for developing human
performance measures that was presented. Participants reported that this framework was logical
and relatively easy to follow. Finally, it appears that there is a need for tools and job aids that
help guide instructors in developing valid and reliable metrics on-the-job.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
693

As a result, we are currently developing a performance measurement authoring tool (PMAT) that
will be delivered to the Navy. Based on the positive feedback obtained from workshop
participants, the tool will include the same 7-step framework that was presented in the
workshops. In addition, the authoring tool will include a great deal of guidance (i.e., definitions,
background information, tips) and probing questions in order for instructors of varying levels of
experience to benefit from the tool. Specifically, the tool will be programmed to incorporate a
wizard (i.e., tutor) that will guide the user through the 7-step framework just as the team of the
facilitators did in the afternoon portion of the workshops. Studies are currently underway to test
the decision rules that will be programmed into the authoring tool. Once completed, a prototype
of the tool will be developed and a series of usability tests will be conducted. Finally, the
effectiveness of PMAT will be tested by asking two groups of instructors to develop measures of
human performance. One group of instructors will develop these measures on their own. The
second group will develop their measures with the assistance of PMAT. Participants will then
use the measures that they have developed to assess performance during a scenario-based
training exercise. System effectiveness will be demonstrated by comparing the reliability and
validity of ratings developed with and without the PMAT system. Additionally, the amount of
time that it takes both groups to develop their measures will also be examined.

REFERENCES

Fowlkes, J. E., Lane, N. E., Salas, E., Franz, T., & Oser, R. (1994). Improving the
measurement of team performance: The targets methodology. Military Psychology, 6, 47-61.
Goodman, P. S., & Garber, S. (1988). Absenteeism and accidents in a dangerous
environment: Empirical analysis of underground coal mines. Journal of Applied Psychology,
73, 81-86.
Hollenbeck, J. R., Ilgen, D. R., Tuttle, D. B., & Sego, D. J. (1995). Team performance
on monitoring tasks: An examination of decision errors in contexts requiring sustained attention.
Journal of Applied Psychology, 80(6), 685-696.
Joslyn, S., & Hunt, E. (1998). Evaluating individual differences in response to time-
pressure situations. Journal of Experimental Psychology: Applied, 4(1), 16-43.
Marks, M. A., Zaccaro, S. J., & Mathieu, J. E. (2000). Performance implications of
leader briefings and team-interaction training for team adaptation to novel environments.
Journal of Applied Psychology, 85(6), 971-986.
Salas, E., Fowlkes, J. E., Stout, R. J., Milanovich, D. M., & Prince, C. (1999). Does
CRM improve teamwork skills in the cockpit? Two evaluation studies. Human Factors, 41(2),
326-343.
Smith-Jentsch, K. A., Johnston, J. H., & Payne, S. C. (1998). Measuring team-related
expertise in complex environments. In J.A. Cannon-Bowers & E. Salas (Eds.), Making decisions
under stress: Implications for individual and team training (pp. 61-87). Washington, DC:
American Psychological Association.
Stout, R. J., Cannon-Bowers, J. A., Salas, E., & Milanovich, D. M. (1999). Planning,
shared understanding, and coordinated performance: An empirical link is established. Human
Factors, 41(1), 61-71.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
694

PSYCHOLOGICAL IMPLICATIONS OF DEPLOYMENTS FOR THE MEMBERS OF


THE SOUTH AFRICAN NATIONAL DEFENCE FORCE (S. A. N. D. F.)
MAJOR CHARLES KENNY M. MAKGATI
RESEARCH PSYCHOLOGIST, MILITARY PSYCHOLOGICAL INSTITUTE, SANDF,
SOUTH AFRICA.
Kennymakgati@hotmail.com

Introduction

The African continent has been emerging for some time now. This process came along with
issues which involve poverty, spilling of blood, migration, refugees and sometimes, death. At the
same time, the democratisation of South Africa, implied that the South African government
assumed a different role in terms of involving the South African National Defence Force in
international missions. This came with a different focus from which these forces used to operate.
Traditionally, the SANDF primarily focused on aspects such as border control, crime prevention
and peace enforcement on a national level. However, the integration of all military forces (South
African Defence Force, non-statutory forces such as Umkhunto we Sizwe and the African
Peoples Liberation Army), the removal of sanctions and changes in international relations,
implied greater political involvement by the South African government in Southern Africa. As a
result, the principles, roles and practices of the SANDF had to change accordingly. Clearly, these
changes could not proceed without some difficulty.
This paper focuses mainly on the involvement of the South African National Defence Force
through the deployment of soldiers into the African continent. Specific attention is laid on the
question of how our South African troops cope with international deployments which at times
include the United Nations. In an attempt to address this, the researcher commences by providing
a brief background on the conflict in Africa and latter addresses the psychological implications
of these deployments.

The Conflict in Africa and the legacies of colonialism

Over the past decades there have been numerous attempts to resolve intra-state conflict in
Africa through mediation. Most of these efforts have failed, with one or more of the parties
spurning negotiations, being unwilling or unable to reach a settlement in the course of mediation,
or subsequently violating agreements that have been concluded. The factors that may account for
the lack of success in each case include the history, nature and causes of the conflict;
demographic, cultural and socio-economic conditions; the goals and conduct of disputant parties,
the role of external actors; and the style and methods of the mediator.
History postulates that colonialism stunted Africa’s political, economic, and social
development. It has been argued that it was during the nineteenth century’s scramble for Africa,
that the European powers partitioned the continent into arbitrary territorial units. The colonies
that emerged lacked internal cohesiveness, and differences and antagonisms among various
indigenous groups were frequently exploited and exacerbated. Africans were given virtually no
voice in political affairs. Designed to support the needs of the colonial powers, colonial
economies required largely unskilled labour and education was neglected. Generally, colonial
powers did not prepare African countries for statehood, which most achieved during the 1960's

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
695

It was not surprising that decolonisation created a new set of challenges which the first
generation of African statesman was ill - equipped to handle. Most of the transitions to
independence were often bloody. The added problem was the poor definition of borders as a
result of the pragmatic decision taken by the Organisation for African Unity (OAU) to accept
colonial defined borders. This in itself let to continuous conflicts as a result of increasing scarcity
of resources. Being unable to come into terms with the ethnic, linguistic, and religious diversity
within the preordained borders, individual African States have found it difficult to build the
national identities which are crucial in creating stability
In addition, the cold war also had profound effects on African Governments and security.
Both the Soviet Union and the United States courted the newly-independent African States (as
well as liberation movements) in an effort to win converts to their respective causes. As a result,
they often supported authoritarian, corrupt, and oppressive Governments. With the end of the
superpower rivalry, many African leaders could not rely on the accustomed backing of the
outside power to lend much-needed political legitimacy, financial and military support to their
regimes. This led to disgruntled and oppressed groups openly and forcefully challenging the
legitimacy of these leaders and the weakened regimes increasingly started to be susceptible to
domestic unrest and violence
At the same time, it may be stated that today’s crisis in Africa were also brought about by the
leaders themselves. The style of government pervasive on the continent has not been conducive
to development, democracy, and peace. Many leaders of the newly-independent African
countries tried to impose national unity by consolidating political and economic power in the
State. This impacted badly on governance with inefficient bureaucracies and corruption rampant
and tolerated.
The economic and fiscal policies of many African States had failed and largely Western-
imposed solutions have created new problems. After the prices of many of their exports slumped
in the 1970's, African States borrowed heavily to maintain Government expenditures. Initially,
Western States and institutions readily lent money on the shared expectation that commodity
prices would recover. By and large, African countries did not invest the borrowed funds
prudently and their debts mounted. Waste and corruption exacerbated the situation.
Subsequently, the international financial institutions restricted access to international loans. As a
result, many African States are still servicing their debts and this has become their
preoccupation. Social responsibilities that were once the purview of the State have been
substantially ignored or subcontracted to others with varying degrees of success. States are also
finding it difficult to provide for their own security. A lot of African military do not posses the
human and material resources or the discipline and inclination to defend the State. To establish
and maintain order, some African States have called upon private security firms (or corporate
mercenaries). This in itself has serious repercussions for peace, sovereignty and self
determination of nations and its people
The political, economic, social, and military challenges to the State have been enormous to
the extent that some writers/authors have suggested that some parts of Africa be re-colonized for
humanitarian purposes until such time when the state would be prepared to govern effectively
and humanely
The proliferation of rebel movements, small arms, and refugees all adversely affect a State’s
ability to govern, and this also threatens regional security. Intra-State conflicts usually spill over
national borders frequently assume regional dimensions. Whereas States have historically
supported-or denied support for-insurgencies in other countries as a means of retaining or

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
696

gaining influence, their abilities to control rebel movements have diminished. Some of these
groups are sufficiently independent that they have themselves reportedly contracted mercenaries.
It is reported that one-fifth of the global diamond market is supplied by African rebel groups who
at times collaborate with one another independent of State patrons.
Vast quantities of weapons, especially small arms, used to fight wars of independence, civil
wars, and insurgencies remain in circulation and help fuel present conflicts. Many African
Governments simply cannot monitor the movement of small arms in their countries or across
their borders-although some are endeavoring to develop such a capacity. Other African
Governments lack the political will to do so.
Like arms flows, movement of people will continue to have profound repercussions on
African security. Countries often have insufficient infrastructure to deal with the influx and
migrations of people, and conflicts over scarce resources frequently arise. The fact that many of
the refugees camps are situated near the borders, makes it easy for rebels to use them as bases to
launch attacks and regroup, thus exacerbating the situation.
It therefore appears that the challenges to African peace and security defy easy solutions.
Many conflicts are multifaceted and deeply entrenched. They require sustained diplomatic and
military engagement to move towards resolving them. Mediating between the conflicting groups
will feature quite prominently in the whole process of peacekeeping in Africa
Modern peacekeeping has developed beyond the mere monitoring of a cease-fire. Fifty years
of UN peacekeeping thus bring various different disciplines-humanitarian relief, human rights
monitoring and education, the protection of refugees, peacemaking, peace-building, and on-
together in one holistic mission plan. Modern multidimensional peacekeeping thus includes
elements such as support voluntary disarmament and demobilisation; programmes to rehabilitate
child soldiers and re-introduce ex-combatants into civil life; de-mining; support for national
reconciliation; rebuilding the judicial system; repatriation of refugees; re-introducing civilian
administration; training a new police force, and so on.

The involvement of South Africa

Until recently South Africa resisted considerable international pressure to contribute to


peacekeeping operations in Africa. Instead, it focused on consolidating the transformation
process in the SANDF. But, at the same time, the government realised that South Africa needed
to prepare for a peacekeeping role in Africa. It therefore sent several officers and diplomats all
over the world on peacekeeping courses; introduced peacekeeping training in its staff courses
and at several other layers; and prepared two infantry battalions and other specialised units for
peacekeeping operations.
Simultaneously the government, in close consultation with interest groups in civil society,
developed a White Paper that would guide South Africa’s participation in international
peacekeeping missions. These and various other efforts culminated in Exercise Blue Crane, a
SADC brigade-size peacekeeping exercise that took place in April in 1999. Out of this exercise,
South Africa’s confidence of being able to take up the peace missions role was enforced.
However, it remained trapped with the dilemma of the meaning of the term peacekeeping on an
operational level and the contextual level.
This was due to the fact that the concept of contemporary peacekeeping is replete with
doctrinal ambiguities and defies a straightforward definition. The term in its present form has
become synonymous with any number of international activities designed to resolve or attenuate

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
697

a conflict. The end of the cold war has seen the tenets of traditional peacekeeping eroded and the
scope of peacekeeping and its activities expanded significantly. In practice, the temporal
boundaries between peacekeeping, and peace-building are not always apparent. The once-clear
distinction between peacekeeping operations and enforcement actions has also become blurred.
This has also led to difficulties as the UN is no longer the only actor in this regard.
Efforts to clarify the terminology have not kept up with the rapid pace of development on the
ground. In an effort to make sense of the changing security environment, the then Secretary
General of the United nations Boutros Boutros-Ghali tried to provide a definition for the
integrally related concepts of preventive diplomacy, peacekeeping, peacemaking, peacebuilding.
This definition gained wide currency, but recently its value has declined with the expansion of
peacekeeping following the end of the cold war. Currently, most commentators speak of
successive generations of United Nations operations. In some circles, the terms peace operations
and peace support operations are used interchangeably with the term peacekeeping operations to
encompass a broad spectrum of conflict management and resolution techniques. The South
African Department of Defence, for example, recently identified and defined nine overlapping
terms and this are, peace missions, peace support operations, preventive diplomacy,
peacemaking, peacekeeping operations, peace enforcement, peace-building and humanitarian
assistance.
To further complicate matters, different countries and organisations ascribe different
meanings to the same terms. Many scholars on Defence broadly use the term peacekeeping to
denote a military or police force deployed at the request of a Government or a representative
group of political and military actors that enjoys wide international recognition. This process
place much greater restraints on the use of force than do pure enforcement actions

The Psychological Implications

Two studies were conducted by the researcher in the Democratic Republic of the Congo and
Burundi in which our troops are currently being deployed for peace missions. The following
results were found to be the most psychological concepts critical in deployments. They included
communication, pre-deployment preparation and the role of the government and the
organisation.

Communication

Gibson, Ivancevich and Donnelly (1994) define communication as transmission of


information and understanding through the use of common symbols, verbal and/or nonverbal.
However, the South African troops experience communication problems on three distinct levels.
These are a) the member and his/her family in the Republic of South Africa, b) the member and
the commanding staff within the mission area and in the Republic, c) the member and other
members from other countries or the same country within the deployment area.

a) The member and his/her family in the Republic of South Africa.

The results revealed that 40% of the deployed South African soldiers find it difficult to
communicate with their families at home. They expressed this as a very real need. In the absence
of continual communication with family members, deploying troops stated that they realised that

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
698

they are “far away from home”. This realisation makes them feel “emotionally paralysed”. At the
same time, their families are concerned about how “safe our loved one is within the
deployment”. Deployed members tend to become psychologically dysfunctional if they are
deprived of communication with family members. They aspire to return home and as a result
“lame” excuses or faked diseases or ailment begin to surface. The group loses its cohesion
because members want to go back home. On average, the results revealed that this tends to be the
pattern after two to three months of deployment. On the other hand, participants were also of the
opinion that communication must be restricted or filtered by family members. Ten percent
indicated that the reason for doing so may include situations where one gets told about a death,
sickness or financial difficulties at home.
In view of one of the functions of communication which include a controlling and a
motivating function which clarifies what needs to be done and how when a task is to be done.
Communication also acts in the formulation of goals, in feedback on progress and in the
reinforcement of desired behaviour. It serves as a release mechanism for emotional expression
enabling individuals to show their frustrations.
In the absence of adequate communication, and if they are not given relief from this
situation, participants tend to view themselves as being as being worthless and are therefore
more prone to danger. Leisure time utilisation then becomes critical. However, in Burundi,
members cannot utilise their leisure time as or when they want to. They need to comply with set
rules aimed at ensuring their safety. Consequently, high levels of alcohol consumption and
unprotected sex are reported. This would confirm the UN report on peace missions that “at any
given moment around 5% of the preselected peacekeeping force may be experiencing increased
psychological problems and up to 50% report increased high risk behaviour” (in Burgess, 2001).

b) The member and the commanding staff within the mission area and in the Republic.

The information giving function of communication is also critical within this context.
Participants expressed dissatisfaction with the fact that the commanding staff receives a great
deal of information that is not relayed to them. This ranges from intelligence reports to
information from their home units in the South Africa. This information is censored to such an
extent that it becomes relatively “worthless” to subordinates. Furthermore, the precise nature and
role of the deployed members is not clearly understood by everyone. Interpretation seems to be
quite varied. The history, origin and necessity of their deployment, are not clearly communicated
to all levels. As a result, miscommunication and misinterpretation occur which add to the level of
stress and frustration experienced by members.
These barriers to effective communication, which include filtering (the sender manipulates
information so that it will be seen favourably), are perceived as being deliberate (Makgati, 2001).
They are not being interpreted as safety measures but as “ethnocentric”. The argument to support
this interpretation is based on the perception that management and subordinates are delineated
along racial lines. Management is not relaying information sufficiently and this is being
considered to be racially motivated. Subordinates view this as an attempt make one group
(Whites) feel superior to the other (Blacks) (Makgati, Mokhoka &Naele, 2002). This ultimately
impacts negatively on group functioning and cohesion.
This dynamics illicit defensive behaviour in members which may be detrimental to the
effective execution of military objectives. Ashforth and Lee (1990) noted that when employees
feel threatened, they tend to react in ways that reduce their ability to achieve mutual

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
699

understanding. Also, when individuals interpret others’ messages as threatening, they often
respond in ways that retard effective communication. As a result, the commanding staff is
perceived as being oppressive and not having the interests of the employees at heart. The latter
view is supported by Johnson and Johnson (2000) who state that the higher the level of
bureaucracy, implying more vertical levels, the greater the opportunities for filtering. Whether
the perceptions and interpretations that members have are correct or not, they do have a negative
impact on the effectiveness of their operational functioning.

c) The member and other members from other countries or the same country within the
deployment area.

It is a well–established fact that cultural differences may lead to uncertainty about human
behaviour (Cox, 1993). Individuals also selectively see and hear based on their needs,
motivation, experiences and other personal characteristics. However, within an international
deployment context, language differences acts as the greatest barrier to effective communication
(Ursin and Olff, 1995). Some members feel substantially depressed by the fact that they are
unable to communicate with their counterparts from other countries. Their inability to
communicate leads to a sense of alienation, stress, anxiety and increases the need to “go home”
(Makgati, 2001).

Pre-deployment preparation

Pre-deployment preparation does not begin when the deploying member actually reports at
the mobilisation area to receive briefings and to do the administration in order to deploy. It
commences from the moment that the member is informed that he or she has been selected for
deployment. Conflict and confirmation become the two primary aspects that members face when
they have to inform their loved ones about this.
At this time, members begin to build defences and coping mechanisms that enable them to
better deal with the deployment. Ursin and Olff (1995) theorise that it is possible to treat defense
as distorted stimulus expectancies within information processing theory. From the time the
members hear of the news, they begin coding and analysing. They would consequently attempt
to make projections into the future. In support of the latter, Lazurus and Folkman (1994)
postulate that we need to regard defense as part of coping strategies.
South African deploying members argue that they have not been fully prepared nor helped to
create these defences and coping mechanisms that are essential towards their deployment. They
argue that the deployment is treated as being an ordinary daily work experience at the home unit.
For instance, members may be concerned about “who will take care of the house” when they are
deployed. Those responsible are often individuals with different interests and this may cause
disputes upon the return of the member.

The Role of the Government and the Organisation

The SANDF is mandated and tasked by the government. However, the organisation is also
responsible to act in the interest of its members. As a result, these three parties are co–dependent
on one another. If one of these parties fails to deliver, the victim is the image of the government
through the deployed member. Both the organisation and the government play a role in the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
700

planning and organisation of all deploying forces and equipment. Agreements get entered into
long before the actual deployment takes place (Makgati, 2000).
Many deployed members of the SANDF tend to blame the government for not ensuring
sufficient racial representivity during deployments. The organisation on the other hand is being
blamed for not considering the needs of the deploying members. This includes ensuring that
there are adequate telephone lines for members to be able to maintain contact with their families.
The organisation is also blamed for not ensuring that appropriate and serviceable equipment is
available to the members. Other complaints centre around the calculation of allowances, the fact
that they are not being told the exact dates for rotations and the absence of health personnel, for
example psychologists and social workers, on the ground. All these discomforts and irritants tend
to lead to disputes that not only affect members and the organisation, but also have repercussions
for the government.

CONCLUSION

The discussion in the present paper highlights critical psychological aspects that South
African deployed soldiers tend to experience. However, this might not necessarily be applicable
only to South African soldiers, but also to military deployments internationally. They remain,
although ignored, common and demand to be addressed. Nonetheless, it is evident that the
ignorance regarding all these aspects is a result of a lack of effective communication. To many,
this may not be explicit, but it remains a reality.
Lastly, Africa bears enormous pressure regarding the economic, social and political problems
that it is facing in this current epoch. Its history on the other hand, points to the fact that the
current problems have a longstanding history. In this regard, the SANDF needs to continue to
play a meaningful role in affecting political stability on the African continent. Interventions
made, need to be well–planned and coordinated in order to have a positive and decisive impact
on the realisation of the African rebirth as envisioned by the president of the Republic of South
Africa, Mr Thabo Mbeki. Ensuring effective communication with deploying soldiers, thereby
empowering them in facilitating the realisation of this vision, is therefore critically important.

REFERENCES
ASHFORTH, B.E. & LEE, R.T. Defensive Behaviors in Organizations: A Preliminary
Model. Human Relations, 1990, 43 (7), 621 – 648.
BARTONE, P.T. & ADLER, A.B. (1994) A model for soldier adaptation in Peacekeeping
Operations. Paper presented at the 36th Annual conference of the International Military Testing
Association, Rotterdam, The Netherlands, October 1994.
BARTONE, P.T. (1996) American IFOR experience : Psychological stressors in the early
deployment period. Proceedings of the 32nd International Applied Military Psychology
Symposium. Brussels, Belgium, May 1996.
BURGESS, W.B.H. (2001) Second Psychological report on Operation Mistral. Unpublished
report, South African Military Health Services, Military Psychological Institute, Pretoria, South
Africa.
COX, T. (1993) Cultural Diversity in Organizations : Theory, Research & Practice. Berrett
– Koehler Publishers. San Francisco.
De Coning, C. & Mnqibisa, K. (2000) Lessons Learned from Exercise Blue Crane. Accord.
Kwa – Zulu Natal.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
701

DEPARTMENT OF DEFENCE (1996). White paper on national defence for the Republic of
South Africa. Pretoria, DOD Policy Publication Database, South African National Defence Force
, South Africa.
DEPARTMENT OF FOREIGN AFFAIRS (1999) White paper on South African
participation in international peace missions. Pretoria, DOD Policy Publication Database, South
African National Defence Force, South Africa
S DU PLESSIS, L. (1997). Historical Roles of Sub-Saharian Armed Forces. Paper presented
at the Congress of the South African Political Studies Association, Mmabatho, South Africa, p 2-
20
GAL, R. & MANGELSDORFF, A.D. (1991) Handbook of Military Psychology. Wiley and
Sons Ltd.
GIBSON, J.L., IVANCEVICH, J.M. & DONNELLY, J.H. (1994) Organizations : Behavior,
Structure, Process. 8th ed. Irwin. Boston.
JOHNSON, D.W. & JOHNSON, F.P. (2000) Joining Together : Group Theory and Group
Skills. 7th ed. Allyn and Bacon. Boston.
LAZARUS, R.S & FOLKMAN, S. (1984) Stress, Appraisal and Coping. Springer. New
York.
MAKGATI, C.K.M. (2001) Pilot report on the deploying members of the South African
National Defence Force to the Democratic Republic Congo. Unpublished report, South African
Military Health Services, Military Psychological Institute, Pretoria, South Africa.
MAKGATI, C.K.M. (2001) On site report on the deploying members of the South African
National Defence Force to the Democratic Republic Congo. Unpublished report, South African
Military Health Services, Military Psychological Institute, Pretoria, South Africa.
MAKGATI, C.K.M., MOKHOKA, M.D & NAELE, A.(2002). Staff paper to the GOC
SAPSD Burundi on the psychological stressors experienced by deployed members of the South
African Contingent. South African Military Health Services Head Office, Pretoria, South Africa.
NOY, S. (1991) Handbook of Military Psychology. Wiley and Sons Ltd.
URSIN, H. & OLFF, M. Aggression, Defense, and Coping in Humans. Aggressive Behavior,
1995, 21, 13 – 19

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
702

The Psychological Impact of Deployments

Colonel A.J. Cotton,


Director of Mental Health
Australian Defence Force, Canberra, Australia
Anthony.Cotton@defence.gov.au

Abstract

Johnstone (2000)63 reported on initial data taken from Australian Defence Force
(ADF) troops returning from deployment in East Timor. This paper will present data
from subsequent East Timor deployments and provide some link to strategic personnel
indicators available. It will also examine the strategic personnel management issues
related to the impacts of deployments and how these are starting to be addressed in the
ADF.

INTRODUCTION

The Australian Defence Force (ADF) has had a program of screening service
personnel on their return from operations since the early 1990s, this has been previously
documented (Cotton, 2002)64. The development (or selection) of appropriate instruments
to support this has been an ongoing activity since the commencement of this program.

The current screening process is based on conducting screening immediately prior


to, or immediately on, return to Australia (the Return to Australia Psychological Screen;
RtAPS); followed by a subsequent screen three to six months after the member returns to
Australia (the Post Operational Psychological Screen; POPS). The process for both is
similar involving some education, the completion of a number of screening tools, and an
individual screening interview.

The selection and development of the screening instruments has been documented
by Deans (2002)65. The development of this battery is on-going and has changed a little
since Deans’ (2002) report, however the core elements of the screens have remained
meaning that it is possible to make some comparisons with these earlier results. Deans
(2002) made a number of recommendations; these were:

a. Future RtA forms need to include an area for personnel to indicate whether they
are reservists on full-time service, or full-time members.
63
Johnston, I. (2000). The psychological impact of peacekeeping deployment. Presentation to the
42nd Annual Conference of the International Military Testing Association.
64
Cotton, A.J. (2002). Screening for Adjustment Difficulties after Peacekeeping Operations.
Presentation to the 44th Annual Conference of the International Military Testing Association.
65
Deans, C, (2002). The Psychological Impact of Peacekeeping Deployments: Analysis of
Questionnaire Data 1999-2001. Research Report 6/2002, Psychology Technology and research
Group, Canberra Australia.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
703

b. Screening and monitoring of personnel deployed overseas continues to occur.

c. The Defence Force Psychology Organisation (DFPO), together with the Defence
Health Service (DHS), establish an appropriate process for the coordination and
effective utilisation of mental health data, and that ADF members be informed
that data from mental health questionnaires will be collected for research
purposes.

d. Norming of the mental health of ADF personnel, both non-deployed and


deployed, is recommended. Development of appropriate norms should be
followed by benchmarking research.

e. The modified version of the GHQ12 in the MHS be replaced by an original


version of the GHQ.

f. A more systematic approach for the use of psychological screening instruments


within the ADF should occur. In determining the appropriate instruments, all
relevant stakeholders (Navy, Army, and RAAF representatives, 1 Psychology
Unit, DFPO, and DHS) should be involved.

g. Future end-of-deployment paperwork include reference to the main location of


deployment within the country of deployment.

Of these, recommendations a. and b. have been adopted, and a more comprehensive


approach to the use of mental health data is now being considered both in a research and
surveillance sense (recommendation c). While the norming of the instruments has yet to be
conducted (recommendation d), that too is being considered. The modified GHQ12 has
been replaced (recommendation e), although not with the original GHQ, and this will be
addressed later in this paper. Policy providing a more systematic approach to screening
instruments has been produced (Health Bulletin 9/2003, 11/20036667; recommendation f),
but the inclusion of details on main location within country has proved to be impractical to
include on the screening paperwork (recommendation g).

Given the progress that has been made on Deans’ (2002) recommendations it seems
timely to review the impact of these changes on the data available from these deployments.

AIM

The aim of this paper is to conduct an analysis of the psychological impacts of


deployments on ADF members returning from operations in East Timor to determine the
impact of changes made to the screening process since Deans’ (2002) analysis.

66
Defence Health Service, Health Bulletin 9/2003, Australian Defence Force Mental Health Screen,
Canberra, Australia.
67
Defence Health Service, Health Bullettin 11/2003, Mental Health Support to Operationally
Deployed Forces, Canberra, Australia.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
704

METHOD

Instruments

Deans’ (2002) recommended the replacement of the modified GHQ12 with the
original form. However, other concerns had been expressed about the GHQ12 from other
quarters within the ADF and a broader search for a screening instrument occurred. This
identified the Kessler Psychological Distress Scale –10 (K10) as a suitable replacement.
This has been widelt used in epidemiological studies both oveseas and in Australia and
shown to correlate well with a Composite International Diagnostic Interview (CIDI)
diagnosis of anxiety or affective disorder, and has been shown to have better
discriminatory power than the GHQ12 (HB 9/2003).

Two other changes to the RtAPS instruments also occurred, both the Acute Stress
Disorder Scale (ASDS)68 and the Alcohol Use Disorders Identification Test (AUDIT)69
were removed from the screen. The ASDS was removed because it was designed to
measure the impact of a specific traumatic incident on the individual. This made it
difficult to employ in the RtAPS context where for the bulk of respondents had not
experienced a single overwhelming traumatic incident but had more likely been involved
in a number of potentially stressful events that were outside their normal range of
experience. Similarly the AUDIT is inappropriate in the RtAPS context where many
ADF personnel will have either had no access to alcohol, only very limited access to
alcohol, or will have had unrestricted access. In all of these cases the behaviour of the
individual will be atypical and therefore make the AUDIT of limited value.

As a result, the RtAPS consists of the following:

a. Personal Details,

b. Deployment Details,

c. K10,

d. Traumatic Stress Exposure Scale – Revised (TSES-R),

e. Posttraumatic Stress Disorder Check List (PCL)

f. Major Stressors Scale.

68
Bryant, R.A., Moulds, M.L., & Guthrie, R.M. (2000). Acute Stress Disorder
Scale: A self-report measure of acute stress disorder. Psychological Assessment,
12, 61-68.
69
Saunders, J.B., Aasland, O.G., Babor, T.F., de la Fuente, J.R, & Grant, M. (1993). Development of
the Alcohol Use Disorders Identification Test (AUDIT): WHO collaborative project on early
detection of persons with harmful alcohol consumption: II. Addiction, 88, 791-804.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
705

The TSES-R is a 13 item scale of potentially traumatic stressors that have been
found to be are commonly experienced by ADF personnel serving on operations. The
Major Stressors is a 36 item scale that covers a range more general stressors that have
also been found to be commonly experienced by ADF members on operations.

Sample

All ADF personnel who had been through the RtAPS process in 2002 and 2003
(the latest entry was September 2003) and whose data had been entered into the database
were included in the sample. This resulted in a total sample of 1,657 cases. Basic
demographic data about the sample are:

a. Gender – male 93.4%, female 6.6%

b. Age – mean 27.44, median 26

c. Marital status – married 49%, partnered 9.6%, separated or divorced 3.7%, single
37.6%

d. Previous deployments – none 55.9%, one 30.4%, more than one 13.7%

e. Average length of service – 7.64 years

Analyses

Analyses were conducted on a number of levels:

a. Descriptive analyses of K10 and PCL, in particular a consideration of the


numbers of individuals meeting clinical cutoffs.

b. Rank order of stressors from both the TSES-R and Major Stressors scale.

c. Comparison of career intentions pre- and post-deployment.

d. Comparisons of clinical scores across key stressors (from TSES-R and Major
Stressors Scale) and key personal details (e.g., number of previous deployments).

RESULTS

Clinical Scales

The K10 offers three clinical score bands; 10-15 low risk (78% of the population),
16-29 medium level of psychological distress, 30-50 high-level risk of psychological

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
706

distress70. Cutoffs for the PCL are less clear, but a score of 50 has been shown to be a
good predictor of PTSD diagnosis in a population of Vietnam combat veterans71.

Analysis of the K10 scores for this sample produced the following results for this
sample: 73.3% low risk, 14.9% medium risk, 1.8% (19 cases) high risk. Analysis of the
PCL showed only five cases reaching the clinical cutoff score.

A total of 43 personnel were referred for follow up as a result of their RtAPS, in


17 cases this was recorded as being related to their deployment, while the remainder were
recorded as being for other reasons. There was no relationship between the reason for
their referral and the referral source (i.e., interviewer, self, unit).

Stressors

Rank ordering the TSES-R resulted in the following stressors rated as the five
most prevalent stressors:
a. Being in danger of being injured.
b. Being in danger of being killed.
c. Witnessing human degradation on a large scale.
d. Seeing dead bodies.
e. Fearing that one had been exposed to a contagious disease or toxic agent.
Of these, the threats of being killed or being injured were the most stressful at the
time and none were causing any significant distress at the time the screen was completed.
Rank ordering the Major Stressors scale resulted in the following stressors being
rated the five most stressful:
a. Double standards.
b. Leadership.
c. The military hierarchy.
d. Risk of vehicle accidents.
e. Separation from family and friends.

Career Intentions
Respondents were asked to record their career intentions prior to deployment and
currently. The proportions of each of these responses is given in Table 1 below.
Table 1
Career Intentions

70
Defence Health Service, Health Bulletin 9/2003, Australian Defence Force Mental Health Screen
71
op cit.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
707

Career Intention Prior to Deployment Current


Long term service career 61.4 53.7
Serve out current enagement 14.7 13.9
Seek Corps/Branch transfer 12.8 13.8
Discharge within one year 9.6 10.3
Discharge immediately 1.5 8.3

Initial examination of this table suggests that there has been a shift in career
intentions away towards a change in career. Further examination showed that 130
respondents (7.8%) changed their career intentions and that the bulk of these (112)
changed in a “negative” direction; i.e., towards a change in career. When the direction of
career change was compared with current career intention this change proved to be
significant (chi square = 77.3; df = 4).

Comparisons

Comparisons of K10 and PCL scores across categories in the five most prevalent
traumatic events (as measured by the TSES-R) all showed significant differences and an
increase in K10 or PCL score with an increase in the frequency of the occurence of the
event. Similar comparisons across the most stressful general stressors (from the Major
Stressors Scale) also showed significant differences with an increase in scores on the
clinical scales as the reported stress of the event.

Comparison of K10, PCL, and composite TSES-R scores across current career
intentions showed significant results for both K10 and PCL, but not for TSES-R score.
For both the K10 and PCL, there was an increase in score as the category of career
intention became more oriented towards changing career.

Finally comparison of K10 and PCL scores across the number of deployments
that the member had prior to the current deployment showed no significant differences.
When current career intentions were compared across number of deployments prior to the
current one this did produce a significant difference. Examination of cell residuals,
showed that those with several deployments prior to the current one were less likely to be
seeking a change of occupation within the military, and those with one deployment prior
to the current deployment were less likely to serve out their current engagement.

DISCUSSION

Analysis of scores on the clinical scales of the RtAPS showed that the rates of
psychological distress are slightly elevated compared to the general population and the
reported levels of PTSD symptoms are low. The levels of psychological distress is
understandable given the recency of the operational experience, and could be expected to
diminish over time. The levels of PTSD symptoms, on the other hand, might be expected
to increase over time, and therefore need to be monitored. These are important markers
for the future health of the individual and, hence, the personnel component of ADF
capability.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
708

The relationship between both K10 and PCL scores with career intentions
certainly needs further investigation. In particular, the possibility of causality between
psychological symptoms and career intentions should be investigated. The relationship
between deployment stressors (traumatic or otherwise) and K10 and PCL scores is as
expected, and provides some guidance on events that may require more immediate
clinical attention when they occur. Finally the lack of a relationship between number of
previous deployments and K10 or PCL scores requires further investigation, particularly
in terms of the inoculation effects of previous deployments.

The effect of the deployment on the career intentions of the member supports
earlier findings and certainly requires more investigation. There are many theories about
the effect of deployments on the member’s career intention, certainly the results here
would suggest that deployments do have an impact on the long term stability of careers.
Given the current level of operational tempo and the challenges that many nations are
facing in recruiting, this is a cause for considerable concern and a clear argument for
further research into the organisational impacts of deployment.

These organisational impacts are of particular importance and need further


discussion. The purpose of RtAPS is to provide mental health screening for individuals
who have been involved in a military operation because we believe that this may have an
adverse impact on the individual concerned. Screening helps command to exercise its
duty of care in an effective and efficient manner by attempting to identify those who
might be in more need of assistance. It is in effect a “triage” system in that allows
command to more effectively allocate resources. This is an individual intervention.

The results presented here, although very preliminary, suggest that there might
also be an organisational “triage” that can be effected through this by identifying adverse
effects for the organisation that occur as a result of participation in a military operation.
We use many objective measures for the costs of an operation and have tended to assume
away the subjective (or people) cost of an operation. Perhaps there is a more objective,
people cost, that is in fact measurable; i.e., the number of individuals who leave the
military, or opt for a reduced career in the military, as a direct result of their involvement
in an operation.

Limitations to the Sudy

There are several clear limitations to this study that need be considered before any
strong statements are made about this:

a. The analysis presented is very superficial and needs to be conducted in


significantly more depth.
b. The measures used here reflect stated intention rather than actual behaviour.
c. The measures used don’t provide any indication of causality.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
709

d. The sample is based on peace operations and so should be broadened to


incorporate warlike operations72.
Having identified the limitation to the study, there is certainly the capacity within
the current data set to address some of these and there is scope within the RtAPS process
to address the remainder.

CONCLUSION

The RtAPS process has been modified to take into account a number of the
recommendations made by Deans (2002)73 while others have yet to be incorporated, and
some are simply too difficult to implement. However, the process clearly provides very
useful information for the organisation and is a useful part of the provision of individual
mental health support to the ADF.

However there is also broad organisational information available from the RtAPS
process that can contribute to the wellbeing of the organisation as opposed to just the
individual. In particular the ability to link mental health data to career intentions and to
be able to gain some indication of the impact of the operation on the individual’s career
intentions could provide a useful organisational “triage”. To do this, the items in the
current RtAPS should be retained and a comparison made across warlike and non-warlike
operations, as well as attempts made to look at the causes of changes in members’ career
intentions.

72
Anecdotal evidence would suggest that the psychological impacts of warlike service are different.
73
Op cit.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
710

THE LEADERS CALIBRATION SCALE


Karen J. Brown, Captain
Canadian Forces, Directorate Human Resources Research and Evaluation,
Ottawa, Canada, Brown.KJ3@forces.gc.ca

“Under a good General, there are no bad soldiers.” Chinese Proverb

Leaders’ ability to directly and indirectly influence all dimensions of climate, such as
morale and cohesion, has well been established. Further, the enhancement of these important
dimensions of climate has also been well recognized as a key method to improve group
performance and combat effectiveness. Thus, as leaders influence all dimensions of climate, it is
vital that leaders be able to accurately assess climate to maximize group effectiveness. In day-to-
day operations, leaders at all levels attempt to informally gauge organizational climate by
judging subordinates’ attitudes. However, previous research has demonstrated that leaders have
a tendency to be overly optimistic in their assessments of climate (e.g., Stouffer et al., 1949).
The “Leadership Calibration Scale” (LCS), previously named the Officer Calibration
Scale, was developed to assess the degree to which Canadian Army officers are capable of
accurately judging their subordinates’ perceptions of morale, cohesion, and confidence in
leadership (Brown & Johnston, 2003). The goals of this instrument also included measuring the
confidence of leaders in their assessments and assisting leaders in re-calibrating any perceptual
discrepancies. The aim of this paper is to present the LCS and test hypotheses relevant to its
success at measuring and reducing discrepancies between leaders and subordinates. The
psychometric properties of the LCS will also be presented.
PREVIOUS RESEARCH
Briefly, research in a number of militaries across many years has demonstrated
discrepancies between leaders’ rating of subordinates’ attitudes of climate and subordinates’
attitudes perceptions of climate. For a more through review of related literature refer to Brown
and Johnston (2002). Stouffer et al. (1949) initiated this avenue of research with their seminal
work on “The American Soldier” where they found that “officers tended to believe that their men
were more favourably disposed on any given point” (p. 392) than the men actually were. This
disparity in agreement between officers/enlisted men was noted in officers’ overestimation of
levels of job satisfaction, confidence in leadership, and pride as soldiers, and underestimation of
aggression towards the military (Stouffer et al., 1949). Again in the US, in 1985, Gabriel (as
cited in Eyres, 1998) reported that soldiers perceived “that officers are of poor quality . . .. in
sharp contrast to the perceptions of the officers themselves, who, in general, believe that they are
doing an adequate job of establishing a bond with their men (p. 22).”
Korpi (1965) working with Swedish conscripts found that leaders tended to overestimate
their subordinates’ responses on morale-related questions with a fairly substantial error rate (22-
25%). Interestingly, the degree of positive bias in perceptions increased along with leaders’
rank. Eyres (1998) concluded that leaders in the Canadian Army were “not having the positive
leadership effect on their subordinates that they think they do” (p. 21) when: (1) non-
commissioned members rated junior officers’ leadership and management skills significantly
lower than officers rated them, and (2) senior officers rated their own leadership ability
significantly higher than did their subordinates.
Considerable divergence has also been found when reviewing leaders’ and subordinates’
ratings on leadership behaviour. Baril, Ayman, and Palmiter’s (1994) review of articles

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
711

comparing self- and subordinate descriptions reported that correlations ranged between zero and
.23 and were often non-significant. A similar study, found that up to 85% of subordinates did
not agree with their leaders’ self-appraisal (Karlins & Hargis, 1988). Although leaders’ self
ratings are only one dimension of climate measured with the LCS, these results provide support
for the lack of congruence between leaders’ and subordinates’ perceptions.
LEADER CALIBRATION SCALE
In view of leaders’ tendency to over-estimate climate, the Leader Calibration Scale (LCS)
was developed to assess the extent of discrepancy between leader and subordinate perceptions, to
measure confidence in assessments, and, ultimately, to assist officers in re-calibrating any
perceptual discrepancies they have in that regard (Brown & Johnson 2002). The organizational
climate dimensions to be measured were based on those measured within the Unit Climate
Profile (UCP), a 47-item attitudinal scale administered to members of the Canadian Army
holding the rank of Sergeant and below. Both instruments measure the following 11 climate
dimensions: morale/social cohesion, task cohesion, military ethos, professional morale,
perceptions of immediate supervisor, as well as confidence in six different levels of leadership.
Definitions of each climate dimension were developed based on the items used to measure each
construct on the UCP, thereby increasing the likelihood that leaders and subordinates would be
responding to similar constructs. Within the LCS, each climate dimension definition preceded a
question on that dimension. Leaders were first asked to rate the statement “Estimate how the
majority of the soldiers under your command would respond to the following statement” for each
climate dimension (e.g., Morale is very high in my unit) using a 5-point Likert-type scale ranging
from 1 (strongly disagree) to 5 (strongly agree). The following hypothesis was posited:
Hypothesis 1. Leaders’ estimates of their subordinates’ attitudes toward climate
dimensions would be significantly higher than subordinates’ actual ratings.
Kozlowski and Doherty (1989) stressed the need to assess perceptions of organizational
climate and leadership at a unit level rather than a global level because it is believed that the
direct and mediating effects of local leaders are likely to have large impacts on the processes and
events within the unit. As such, the following hypotheses were tested:
Hypothesis 2a. Perceptions of climate would vary significantly as a function of company
and leader/subordinates membership.
Hypothesis 2b. Perceptions of climate would vary significantly as a function of platoon
and leader/subordinates membership.
The inability to judge soldiers’ attitudes has been attributed to officers’ overconfidence in
their judgments (Korpi, 1965; Farley, 2002). Cognitive and sensory judgment research
has established a relationship between the accuracy and confidence of judgements (Baranski
& Petrusic, 1999): individuals are often overconfident in their judgments, especially when the
judgments in question are difficult to make (Baranski & Petrusic, 1999). Overconfidence in
judgment of sensory tasks has also been found in cognitive judgement and intellectual
knowledge tasks (Baranski & Petrusic, 1995; Baranski & Petrusic, 1999). Based on these
observations, confidence items were developed and included in the LCP. Thus, immediately
following each dimension rating, leaders were asked to rate the following statement “Indicate
how confident you are in the accuracy of your rating” using a 4-point Likert-type scale ranging
from 1 (not at all confident) to 4 (highly confident). It was further hypothesized that:
Hypothesis 3. Confidence ratings of the leaders will be negatively correlated to the
accuracy ratings of each climate dimension.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
712

Research has shown that making a leader aware of these discrepancies can facilitate the
leader’s success (e.g., Becker, Ayman, & Korabik, 2002). Examining the agreement between
subordinate and self-ratings of leaders in an actual upward feedback session, London and
Wohlers (1991) found that leaders’ ability to accurately judge subordinates’ attitudes improved
over time; one year after the initial feedback session, which included results from subordinates,
agreement in ratings increased significantly, although not dramatically from Time 1 (r2 = .28) to
Time 2 (r2 = .32). This supports the premise that discrepancies identified with the LCS could
decrease over time if leaders were provided feedback on their accuracy. As the ultimate goal is
to facilitate leaders’ ability to accurately judge the attitudes of their soldiers, the following
hypotheses are offered:
Hypothesis 4a. Discrepancies between leaders’ ratings of subordinate’s attitudes of
climate dimensions and subordinates’ rating would reduce significantly across the four phases of
administration.
Hypothesis 4b. Confidence levels would vary significantly over the phases of
administration. At Time 2 and 3, the confidence levels would initially be lowered (i.e., Phase 2)
until leaders re-calibrate their assessments of climate, and then confidence levels would raise
(e.g., Phase 4).
RESULTS
The Human Dimensions Operations (HDO) survey was administered throughout the
course of an operational tour: a predeployment phase and three in-theatre phases. The HDO – W
version, including the LCS was administered to 552 leaders (warrant officers and above) and,
concurrently, the HDO – S version, including the UCP, was administered to 2,064 subordinates
(sergeants and below). Results of the UCP were averaged for each climate dimension
(i.e., appropriate items were summed and averaged for each construct). Participants’ resultant
scores for each climate dimensions were merged with the LCP results. Demographic information
such as company, platoon, rank, etc. was available for both data sets.
Hypothesis 1. Previous findings that leaders overrated soldiers’ perceptions of climate
(Brown & Johnston, 2002) were confirmed with eleven separate one-way ANOVAs (Tabachnick
& Fidell, 2001). Results revealed significant differences between leaders and subordinates on all
of the climate dimensions (refer to Figure 1), thereby supporting Hypothesis 1.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
713

Figure 1 - Mean Perception Differences on Climate Dimensions

Leaders Subordinates

1
M

Le

Pl

Se

CO
Ta

Pr

Pl

Co

CS
or

ili

c
ad
sk

M
fe

Co
al

ta

om

Co
er

O
s
Co
e/

ry
sio

sh

m
d

m
he

Et

d
Co

ip
na

d
sio

ho
lM

Sk
he

s
n

ill
sio

or

s
al
n

Hypothesis 2a. Significant interactions were found between companies and leaders /
subordinate membership for the following dimensions of climate: Task Cohesion, Morale/Social
Cohesion, Leadership Skills, and Confidence in the CO, Coy Comd, Pl Comd, and Sec Comd.
Significant differences were observed between leaders’ and subordinates’ perceptions of climate
for most companies.
Hypothesis 2b. Similar results were found in significant interactions between platoon and
leaders/subordinates membership for Task Cohesion, Morale/Social Cohesion, Military Ethos,
and Confidence in the CO, Pl Comd, and Pl WO. Here, too, the nature of discrepancies differed
across the various platoons.
Hypothesis 3. A review of the confidence ratings indicated the means ranged from 3.39
to 3.59 on a 4-point Likert-type scale, indicating confidence in assessments. Correlations
between leaders’ perceptions of climate and their confidence in their ratings revealed significant,
positive relationships (refer to Table 2). To further test this hypothesis a difference score was
calculated for each leader by subtracting the mean of subordinates’ ratings on each dimension
within their respective companies from the leaders’ rating for each phase. Thus, a positive
difference score indicated an over-estimation of that dimension. Significant and positive
correlations were found between the difference scores and confidence in ratings for all climate
dimensions with the exception of Professional Morale. As all but one dimension of climate
demonstrated increased confidence as leaders’ accuracy decreased Hypothesis 3 was supported.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
714

Table 2 - Summary of Correlations Between Perceptions, Confidence, and Difference Scores

Topic Correlation Correlation


Perceptions / Confidence Difference Score/ Confidence
Military Ethos .24** .24**
Task Cohesion .32** .29**
Morale/Social Cohesion .22** .17**
Professional Morale .27** .25**
Leadership Skills .14** .11*
Sec Comd .29** .22**
Pl WO .29** .26**
Pl Comd .11* .01 NS
Coy Comd .18** .22**
CSM .27** .24**
CO .22** .23**
Note: * p < .05 and ** <.01

Hypothesis 4a. Eleven one-way ANOVAs revealed that leaders’ perceptions of climate
differed significantly across the phases of the tour for only two climate dimensions: Task
Cohesion, F (3, 532) = 3.21, p < .05, and Morale/Social Cohesion, F (3, 536) = 4.89, p < .01.
Post hoc assessments revealed that Task Cohesion at the Predeployment Phase was significantly
lower than Phases 1 and 2. Reported levels of Morale/Social Cohesion were significantly lower
at Predeployment Phase than Phases 1, 2, and 3.
To assess whether discrepancies between leaders’ and subordinates’ perceptions altered
over the course of the tour, individual ANOVAs were conducted on the difference scores for
each climate dimension to determine if they varied as a function of the phase of administration.
Significant differences across phases were found for Task Cohesion, F (3, 487) = 6.51, p < .001,
Morale/Social Cohesion, F (3, 501) = 7.93, p < .001; Professional Morale, F (3, 488) = 3.05, p <
.05; and Pl Comd, F (3, 360) = 3.28, p < .05 (refer to Figure 2). Task Cohesion and Professional
Morale discrepancies were significantly lower at the Predeployment phases than all other three
in-theatre phases. Morale/Social Cohesion discrepancies were significantly lower at
Predeployment than Phase 3 and lower at Phase 1 than Phases 2 and 3. The difference scores for
Pl Comd were significantly higher at Phase 3 than Phase 1. Contrary to expectations leaders’
assessments actually underrated soldiers’ attitudes about Morale/Social Cohesion at Phase 1 and
Professional Morale at Predeployment. Therefore, as the discrepancies in climate dimensions
appeared to increase over the course of tour rather than fluctuate or decrease, Hypothesis 4b was
not supported.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
715

Figure 2 – Difference Scores Means Across the Phases

Pre Phase 1 Phase 2 Phase 3

0.9

0.7

0.5

0.3

0.1

-0.1
Task Cohesion Morale/Social Cohesion Professional Morale Pl Comd

Hypothesis 4b. Similarly, another eleven one-way ANOVAs were conducted to assess
changes in reported levels of confidence in assessments (refer to Figure 3). Significant
differences across phases were found for Professional Morale, F (3, 536) = 2.76, p < .05,
Leadership Skills, F (3, 532) = 4.13, p < .01, Pl Comd, F (3, 398) = 3.09, p < .05, and Pl WO, F
(3, 388) = 3.81, p < .01. Confidence in rating perceptions of Professional Morale, Leadership
Skills, and Pl WO were significantly higher at Phase 2 than at Predeployment. Confidence in
rating Pl Comd was significantly lower at the Predeployment Phase than at Phases 2 and 3.
Results did not support Hypothesis 4b as the prediction that confidence would lower after an
initial assessment was not found; instead the opposite that confidence levels increased or did not
change across administrations.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
716

Figure 3 – Confidence Differences Across Phases

Pre Phase 1 Phase 2 Phase 3

1
Professional Morale Leadership Skills Pl Comd Pl WO

Psychometric Analyses
Several psychometric properties of the LCS were examined. Initially, a Principal
Components Analysis (PCA) was conducted on all 22 items in the LCS. A varimax rotation
yielded a five-component solution that accounted for 73.7% of the variance. Table 3 reveals the
five factors as: (1) Perceptions of Direct Leaders and their Confidence Rating, including Sec
Comd, Pl WO, and Pl Comd; (2) Perception of Indirect Leaders and respective Confidence
Rating, including Coy Comd and CSM; (3) Perceptions of Climate; (4) Confidence in Climate
Perceptions; and (5) Perceptions of the CO and Confidence Rating. The results of these analyses
will be used in future administrations of the HDO.
The internal consistency or reliability of the LCS was tested with the Cronbach’s
coefficient alpha. Initially, the reliability was tested for the original item structure of all items
measuring soldiers’ perceptions of climate (α = .79) and confidence ratings (α = .79); both sub-
scales demonstrated good internal consistency. High reliability was found for the dimensions
based on the PCA results: Direct Leadership (α = .97), Indirect Leadership (α = .93), Climate (α
= .76), Confidence in Climate Ratings (α = .80), and CO (α = .99).
DISCUSSION
Significant discrepancies between leaders’ and subordinates’ ratings suggest that leaders
in operational roles may not be accurately assessing their subordinates’ perceptions of unit
climate. A number of factors complicate the explanation of this phenomenon. First, fluctuating
group membership across the phases, as a result of voluntary participation, likely decreased the
number of significant differences between climate dimensions and some groups (e.g., company
and platoon). Second, the group representation varies from administration to administration

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
717

adding another variable of individual differences that cannot be controlled, due the confidential
nature of the study.
Consistent with the literature, inaccurate judgements were positively related to
confidence in assessments. This study however provided no evidence that leaders re-calibrate
their assessment of subordinates’ attitudes. Once again, this could be due to the data collection
method that makes it difficult to ensure that the same leaders are participating at each phase.
Moreover, participation in the survey is anonymous to provide a secure environment for candid
responses. It is, therefore, not possible to identify leaders in order to conduct a direct
comparison of each leader’s responses at various phases to determine the nature of changes (e.g.,
reduced confidence, increased accuracy) at an individual. Consequently, it cannot be determined
whether leaders are: (a) receiving feedback on the discrepancies in perceptions, and (b) whether
they have altered their confidence and climate ratings to determine if re-calibration has actually
occurred.
Further, this study provides general evidence that confidence in assessments altered
across the phases of the tour. Although this finding was not supported across all climate
dimensions, does indicate that leaders may be adjusting their confidence levels. As
discrepancies did not decrease over the phases, it is unknown whether the changes in confidence
in ratings were due to individual differences or incongruent re-calibration of confidence.
Regardless of the cause, leaders’ changes in confidence levels did not result in higher levels of
accuracy in judging subordinates’ attitudes about climate.
Finally, the survey results may be influenced by the very survey itself. Leaders may not
have been receptive to negative feedback about their ability to accurately judge subordinates’
attitudes. They may also be sceptical about the utility or the potential application of results to
performance appraisal. “Survey fatigue” and frustration with the repetitive nature of the queries
in the HDO may also have taken their toll on respondents. Moreover, many participants of this
survey, especially subordinates, believe that the results will “fall on deaf ears”. As a result,
responses on perceptions of leaders and climate may be inaccurate, either under or over rated.
FUTURE DIRECTIONS
Research. Ideally, a tightly controlled experiment across several groups would permit a
direct comparison between individual leaders’ ratings of climate and confidence between
different administrations and ensure feedback is provided. Additionally, research should be
conducted to determine how leaders react to, interpret, and apply the feedback they receive from
LCS results (London & Wohlers, 1991). Further, the responses of subordinates warrant deeper
exploration; subordinates might respond differently (e.g., more frank) if the purpose of the
survey varied (e.g., performance, developmental, or research). London and Wohlers (1991), for
example, found that a substantial percentage of subordinates (34%) stated they would have
responded differently if they had known the results were for performance appraisal.
Professional Development. There are many possible explanations for the discrepancy
between leaders’ and subordinates’ perceptions of climate such as: power and status differentials,
physical distance, size of the unit, personal contact, (Korpi, 1965), self-protection mechanisms
(e.g., denial, self-promotion and defense mechanisms: Korpi, 1965; London & Wohlers, 1991),
self-awareness and situational factors (Becker, et al., 2002), different frames of references for
assessment (Baril et al., 1994), and projection of own attitudes, either negative or positive, onto
the soldiers Stouffer et al. (1949).
Regardless of the cause of discrepancies, it is likely that different forms of education
and/training can improve leaders’ accuracy. Professional development programs can be

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
718

developed to enhance self-awareness of factors such as self-monitoring and defensive


mechanisms. Furthermore, merely providing leaders with the results of the LCS may increase
self-awareness and result in re-calibration of their assessment and confidence, which could
greatly reduce these attitudinal discrepancies. This upward form of feedback has been linked
with reduced divergent perceptions between leaders and subordinates (London & Wohlers,
1991). In addition, formal leadership training programs could focus on the importance of having
an accurate appraisal of their group (e.g., platoon) and assessment training, practice, and
evaluation of their ability to accurately climate dimensions of troops. Naturally, research should
precede any alterations to training or professional development programs.
The ability to accurately judge unit climate will provide officers with an additional skill
with which to maintain and/or improve morale, cohesion, confidence in leadership, and military
ethos, which in turn will improve combat effectiveness. In addition, results from this study can
also be used to develop pre-deployment and leadership training that nurture this ability. The
ultimate goal of the LCS is to improve leadership effectiveness and mission success.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
719

REFERENCES
Baranski, J.V., & Petrusic, W.M. (1995). On the calibration of knowledge and perception.
Canadian Journal of Experimental Psychology, 49, (3), 397-407.
Baranski, J.V., & Petrusic, W.M. (1999). Realism of confidence in sensory discrimination.
Perception and Psychophysics, 61, (7), 1369-1383.
Baril, G. I., Ayman, R., & Palmiter, D. J. (1994). Measuring leadership behavior: Moderators of
discrepant self and subordinate descriptions. Journal of Social Psychology, 24, (1), 82-
94.
Becker, J., Ayman, R., & Korabik, K. (2002). Discrepancies in Self/Subordinates’ perceptions
of leadership behaviour. Group and Organization Management, 27 (2), 226-244.
Brown, K. J., & Johnston, B. F. (2002). The Officer Calibration Scale: Towards Improving
Officers’ Ability to Judge Unit Climate in the Canadian Army. Presented at the 39rd
International Applied Military Psychology Symposium, Brussels, Belgium.
Eyres, S.A.T. (1998). Measures to Assess Perceptions of Leadership and Military Justice
in the Canadian Army: Results from the 1997 Personnel Survey. Sponsor Research
Report 98-5. Director Human Resources Research and Evaluation, National Defence
Headquarters, Ottawa, Ontario, Canada.
Farley, K. M. J. (2002). A Model of Unit Climate and Stress for Canadian Soldiers on
Operations. Unpublished dissertation. Department of Psychology, Carleton University
Ottawa, Ontario.
Karlins, M., & Hargis, E. 1988. Inaccurate self-perceptions as a limiting factor in managerial
effectiveness. Perceptual and Motor Skills, 66, 665-666.
Kozlowski and Doherty (1989). Integration of climate and leadership: Examination of a
neglected issue. Journal of Applied Psychology, 74, (4) 546-553.
Korpi, W. (1965). A note on the ability of military leaders to assess opinions in their units. Acta
Sociologica, 8, 293-303.
London, M. , & Wohlers, A. J. (1991). Agreement between subordinate and self-ratings in
upward feedback. Personnel Psychology, 44, 375-390.
Stouffer, S. A., Lumsdaine, A. A., Lumsdaine, M. H., Williams, R. M. Jnr., Smith, M. B., Janis,
I., L., Star, S.A., & Cottrell, L. S. Jnr. (1949). Studies in Social Psychology in World War
II. The American Soldier: Combat and its Aftermath. Princeton, NJ: Princeton
University Press.
Tabachnick, B.G. & Fidell, L.S. (2001). Using Multivariate Statistics. 4th Edition. Needham
Height, MA: Allyn & Bacon.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
720

Table 3 - Principle Components Analysis


Item Communalities Direct Indirect Confidence in Perceptions of CO
Leadership Leadership Climate Ratings Climate
Sec Comd .92 .93
CR74 of Sec .92 .93
Comd
CR of Pl WO .81 .89
Pl WO .81 .89
CR of Pl Comd .86 .84
Pl Comd .86 .84
CR of CSM .90 .84
CSM .90 .83
CR of Coy .89 .76
Comd
Coy Comd .88 .75
CR of Task .63 .79
Cohesion
CR in Prof .60 .75
Morale
CR of .55 .74
Morale/Social
Cohesion
CR of .51 .68
Leadership
Skills
CR of Military .44 .65
Ethos
Morale/Social .63 .78
Cohesion
Prof Morale .60 .74
Task Cohesion .56 .69
Military Ethos .52 .69
Leadership .53 .68
Skills
CO .97 .97
CR of CO .97 .97
Eigenvalue 7.16 3.67 2.31 1.81 1.26
Percent 32.56 16.70 10.52 8.22 5.71
Variance

74
CR is the Confidence Rating the leader gave of his assessment of soldiers’ perceptions of that dimension.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
721

Leadership Competencies: Are We All Saying the Same Thing?

Jeffrey D. Horey
Caliber Associates
49 Yawl Dr.
Cocoa Beach, FL 32931
horeyj@calib.com

Jon J. Fallesen, Ph.D.


Army Research Institute
Ft. Leavenworth, KS
jon.fallesen@leavenworth.army.mil

In the course of developing an Army leadership competency framework focused on the


Future Force (up to year 2025), the authors examined several existing U.S. military and civilian
leadership competency frameworks. We attempt to link the core constructs across the
frameworks and identify similarities and differences in terms of their content and structures.
We conclude that leadership competency modeling is an inexact science and that many
frameworks present competencies that mix functions and characteristics, have structural
inconsistencies, and may be confusing to potential end users. Recommendations are provided to
improve the methods and outcomes of leadership modeling for the future.

Table 1 represents many of the traits and characteristics commonly found in leadership
competency frameworks. At first glance it may appear to be a comprehensive framework for
leaders. It includes values (principled, integrity), cognitive skills (inquiring, thinking),
interpersonal skills (caring, enthusiastic, communicating), diversity components (tolerance,
respect, empathetic), and change orientation (open-minded, risk taking).

Table 1
Sample Leadership Competencies

Inquiring Thinking Communicating Risk Taking Principled


Caring Open-Minded Well Balanced Reflective Committed
Confident Cooperative Creative Curious Empathetic
Enthusiastic Independent Integrity Respect Tolerance

Surprisingly, this is not an established leadership framework but rather a list taken from a
th
4 grade student profile guide. While a simplistic example, it illustrates both the universality of
the competency concept and the potential confusion when associating a simple list of traits and
processes with leadership.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
722

WHAT IS LEADERSHIP?

This, of course, is the $64,000 question (maybe it’s now the Who Wants to be a
Millionaire question?). As the Armed Forces face a rapidly evolving and complex future threat
environment, it is crucial that leadership in these organizations be well defined, described and
inculcated. Part of this challenge includes establishing a common language for discussing
leadership concepts and ensuring consistent assessment, development, reinforcement, and
feedback processes are in place for maintaining leadership across our forces.

So, again, what is leadership? Apparently, decades of research, dozens of theories, and
countless dollars haven’t completely answered this question. If it had, then we wouldn’t have
vastly different visions of leadership and leadership competency across similar organizations. Or
would we?

An acceptable definition of leadership might be ‘influencing, motivating, and inspiring


others through direct and indirect means to accomplish organizational objectives.’ Defining
leadership is an important first step toward establishing how it should be conducted within an
organization. However, a simple definition is insufficient for describing the nature, boundaries,
contexts, and desirable manifestations of leadership. Enter the evolution of competencies.

WHAT IS THE PURPOSE OF COMPETENCIES?

Behavioral scientists and organizational development professionals seek to improve


individual and group work processes through the application of systematic procedures and
research-based principles. Job analysis techniques, and to a lesser extent competency modeling,
have long been used to establish the requirements of jobs and positions throughout organizations
and provided input to selection, training, and management practices. Knowledges, skills,
abilities, other characteristics (KSAOs), tasks and functions, and more recently competencies
have become the building blocks of leadership selection and development processes.
Competencies have become a more prevalent method of identifying the requirements of
supervisory, managerial, and leadership positions, rather than job or task analysis techniques,
because they provide a more general description of responsibilities associated across these
positions (Briscoe and Hall, 1999).

Employees want information about what they are required to do (or confirmation of what
they think they are supposed to do) in their jobs or positions. The operative word here is ‘do’.
They typically do not want to know what they are supposed to ‘be’. This simple representation
of leadership requirements helps us establish a context for evaluating leadership competencies
and frameworks/models. Those that are stated only as traits, characteristics, or in attribute terms
are, in our estimation, less valuable than those that are stated in task, function, and behavioral
terms. However, models that address both aspects of leadership may prove to be more valuable
to more individuals.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
723

The purpose in establishing competencies for leaders should be to better define what
functions leaders must perform to make themselves and others in their organizations effective.
Many competency definitions include reference to clusters of knowledges, skills, abilities, and
traits that lead to successful performance (Newsome, Catano, Day, 2003). Yet competency
labels are typically expressed in either process or functional terms. This can lead to confusion as
to what competencies actually represent for leadership and organizations. Competency
frameworks or models should serve as the roadmap to individual and organizational leader
success. The value of competencies is in providing specific or at least sample actions and
behaviors that demonstrate what leaders do that makes them successful. Therefore the end goal
of all frameworks or models should be to provide measurable actions and behaviors associated
with leadership functions. Functions are a step removed from this goal, while KSAOs, traits, and
attributes are yet another step removed.

Leadership competency modeling has been in vogue for several decades but the methods
for developing these models and the content are as varied as the organizations for which they
have been developed. Briscoe and Hall (1999) identify four principal methods for developing
competencies and Newsome, Catano, and Day (2003) present summaries of competency
definitions and the factors affecting their outcomes.

COMPONENTS OF COMPETENCIES

The components of competency frameworks are seemingly as varied as the competencies


themselves. Competencies are generally no more than labels that require additional detail to
communicate how they relate to leadership and behavior. This detail may come in the form of
definitions, elements or subcomponents of the competencies, and behaviors, actions or other
indicators of manifesting the competency or elements. More detailed frameworks may include
hierarchies of competencies or elements based on levels of leadership or other distinctions. In
some cases, it’s unclear what the higher order labels (e.g., Leading Change, Performance) should
be called.

We must also preface our discussion by admitting it is not completely fair to judge any
frameworks by a high level, surface comparison of the labels and definitions/descriptions of the
competencies and components. We did use as much of the definitions and description of the
framework components as possible in making our comparisons. A more accurate analysis of
these frameworks would involve an elemental analysis of each framework construct that is
beyond the scope of this paper. However, it is this high level aspect of the framework that, in
some sense, sets the stage for the acceptance and comprehension of the framework by the
intended audience.

NOW, ON TO THE LEADERSHIP FRAMEWORKS

We wish to thank the Center for Strategic Leadership Studies at the Air War College for
inspiring this paper with their extensive presentation of military and civilian leadership issues. If

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
724

you are not familiar with their website (http://leadership.au.af.mil/index.htm), we encourage you
to explore it.

We chose to review leadership frameworks from the four major services, the Coast
Guard, and the Executive Core Qualifications that apply to senior civilian leaders within the
federal government. Table 2 presents overview information for the frameworks that includes the
service entity, sources for the frameworks, and components that we investigated. Initially, we
sought to determine the similarity of constructs across the frameworks. In the course of this
comparison we also recognized variation in the types of constructs represented within a
particular framework, overlap among the components, and different levels of detail across the
frameworks. We discuss each of these as well.

Table 2
Overview of Competency Frameworks

Service Coast Guard Army Marine Air Force Executive Navy*


Corps Core
Qualifications
Source COMDTINST Field Manual USMC AF Senior Office of ereservist.net;
5351.1 22-100 Proving Level Personnel Naval
Grounds Management Management Leadership
Office Training Unit
Components 3 Categories, Be, Know, Do: 11 3 Main areas, 5 Areas, 27 4 Guiding
Framework 21Competencies 7 Values, 3 Principles, 24 Competencies Principles, 5
Attributes, 4 14 Traits Competencies Areas, 25
Skills, 12 at 3 Levels of Competencies
Different Leadership
Actions at 3 (Tactical,
Levels of Operational,
leadership Strategic)
(Direct,
Organizational,
Strategic),
Performance
Indicators

* the Navy leadership competency framework is currently in revision and a copy of the
most recent version was not available at the time of publication. Four guiding principles are
highlighted, two of which are also considered main areas.

Definitions of leadership or leadership competency for the frameworks we investigated


are as follows:

Coast Guard – leadership competencies are measurable patterns of behavior essential to leading.
The Coast Guard has identified 21 competencies consistent with our missions, work force, and
core values of Honor, Respect, and Devotion to Duty. (COMDTINST 5351.1)

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
725

Army – influencing people – by providing purpose, direction, and motivation – while operating
to accomplish the mission and improving the organization. Leaders of character and competence
act to achieve excellence by developing a force that can fight and win the nation’s wars and
serve the common defense of the United States. (FM 22-100, 1999).

Marine Corps – no definition found, seemingly defined by the principles and traits.

Air Force – leadership is the art of influencing and directing people to accomplish the mission.
(AFT 35-49, 1 Sep 85).

Navy – no definition found, can be inferred from four guiding principles: professionalism,
integrity, creativity, and effectiveness.

Civilians – no definition of leadership found for the ECQs. All core qualifications have
definitions.

At the most basic level, the frameworks can be compared on the sheer number of
components and structures that comprise them. Hardly an exact or enlightening comparison,
they nonetheless vary from the 24 components of the Coast Guard framework to the 34
components of the Navy framework. The Coast Guard, Air Force, ECQ, and Navy frameworks
present essentially two levels of framework components, although the Navy seems also to be
considering 4 guiding principles in their conceptualization. The Army and Marine Corps
presentations are not technically competency-based frameworks, but are still appropriate for
comparison with the others. The Army and Air Force frameworks also provide specific guidance
related to level of leadership and application of components.

In Table 3 we attempt to link similar constructs across the 6 frameworks. This table
presents a more detailed treatment of similarities and differences across the services. Again, we
used the definitions and descriptions in making our links but in many cases the complexity of the
definition or description made it difficult to completely represent how the component is related
to others or distinguished from others in this table. We reiterate the goal of this comparison is to
show, at a relatively broad level of abstraction, how these frameworks compare to one another.

Bold text in Table 3 represents the main competencies or the highest level of each
framework for those that clearly included such a distinction (Coast Guard, Air Force, Navy, and
ECQs). Across rows, we attempt to group similar constructs among the frameworks for
comparison. In several cells within the same framework, we have grouped constructs that we
feel are also similar enough to consider them part of the same construct. The most prevalent
example of this is related to the value construct. Therefore, while there are 41 rows in our table,
this doesn’t necessarily equate to 41 unique constructs of leadership across the six models.

The constructs that appear to have the greatest concurrence across the six models
(represented in 4 or more frameworks) are performing/executing/accomplishing mission;
vision/planning/preparing; problem solving/decision making; human resource management;

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
726

process/continuous improvement; motivating/leading people; influencing/negotiating;


communicating; team work/building; building/developing partnerships; interpersonal skills;
accountability/service motivation; values; learning (including components of adaptability,
flexibility, awareness); and technical proficiency. Other constructs that are common across 3 of
the frameworks are driving transformation/leading change; strategic thinking; diversity
management; mentoring/developing people (distinct from team building); and physical/health/
endurance.

There were six additional constructs that were represented in two of the frameworks but
the authors caution that much of the agreement between these constructs is due to the extreme
similarities in the Navy and ECQ models (overlap on 6/14). These constructs are external
awareness; political savvy/working across boundaries; customer service/focus; conflict
management; resource stewardship; financial management; tactical/translating strategy (same
construct?); leveraging technology/technology management; looking out for others; developing
responsibility/ inspiring/empowering/exercising authority; leading courageously/combat/crises
leadership; assessing/assessing self; personal conduct/responsibility; demonstrating
tenacity/resilience; and creativity and innovation. Unique constructs, at least on the surface of
the models, appear to be entrepreneurship (defined in terms of risk taking), integrating systems
(akin to systems thinking); emotional (attribute); inspiring trust; enthusiasm; and followership.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
727

Table 3
Leadership Competency Components Compared
Coast Guard Army Marine Corps Air Force Navy ECQ
Performance Executing; Operating Ensure assigned tasks are Leading the Institution; Accomplishing Mission; Results Driven
understood, supervised, Driving Execution Effectiveness
and accomplished
Vision Development and Planning/Preparing Creating and Demonstrating Vision Vision
Implementation Vision
External Awareness; Political External Awareness
Awareness
Thinking/Working Across Political Savvy
Boundaries
Customer Focus Customer Service
Driving Transformation Leading Change Leading Change
Decision-Making and Mental; Decision Making; Make sound and timely Commanding; Exercising Decisiveness/Risk Management; Problem Solving;
Problem-Solving Conceptual decisions; Decisiveness; Sound Judgment Problem Solving Decisiveness
Judgment
Conflict Management Conflict Management
Applying Resource Resource Stewardship
Stewardship
Financial Management Financial Management
Workforce Management Attracting, Developing, and Human Resource Management Human Resource
Systems; Performance Retaining Talent Management
Appraisal
Shaping Strategy Strategic Thinking Strategic Thinking
Tactical Translating Strategy
Management and Improving Initiative Driving Continuous Continuous Improvement
Process Improvement Improvement

Entrepreneurship (Risk
Taking)
Leveraging Technology Technology Management
Integrating Systems
Working with Others Motivating Employ your command in Leading People and Teams Leading People; Working with Leading People
accordance with its People
capabilities
Influencing Others Influencing Influencing and Negotiating Influencing and Negotiating Influencing and Negotiating

Respect for Others and Leveraging Diversity Leveraging Diversity


Diversity Management
Looking out for Others Know your Marines and
look out for their welfare
Effective Communication Communicating Keep your Marines Fostering Effective Oral Communication; Written Oral Communication;
informed Communications Communication Written Communication
Coast Guard Army Marine Corps Air Force Navy ECQ

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
728

Group Dynamics Train your Marines as a Fostering Teamwork and Team Building Team Building
team Collaboration
Develop a sense of Inspiring, Empowering, and
responsibility among your Exercising Authority
subordinates
Mentoring Mentoring Developing People
Leading Courageously Combat/Crisis Leadership
Building; Developing Building Relationships Partnering Building Coalitions/
Communication;
Partnering
Emotional
Self Interpersonal Tact Personal Leadership Professionalism Interpersonal Skills
Accountability and Dependability Responsibility, Accountability, Service Motivation;
Responsibility Authority; Service Motivation; Accountability
Aligning Values Loyalty; Respect, Duty, Bearing; Courage; Integrity; Leading by Example Integrity Integrity and Honesty
Selfless Service; Honor, Justice; Unselfishness;
Integrity, Personal Courage Loyalty; Set the example
Followership
Health and Well Being Physical Endurance
Personal Conduct Seek responsibility and
take responsibility for your
actions
Self Awareness and Learning Know yourself and seek Adapting Flexibility Flexibility; Continual
Learning; Leadership improvement Learning
Theory
Technical Proficiency Technical Be technically and tactically Technical Credibility Technical Credibility
proficient; Knowledge
Inspiring Trust
Demonstrating Tenacity Resilience

Enthusiasm
Assessing Assessing Self
Creativity and Innovation Creativity and Innovation

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
729

In answer to ‘are we all saying the same thing?’ we respond with a simple mathematical
exercise. Among the 41 constructs represented in Table 3, 20 are included in three or more
frameworks, 15 are included in two, and six are unique to a single framework. Too close to call?
In about half the cases, the frameworks appear to be saying the same thing but there are also
significant differences in terms of what is included, or at least the level at which it is included in
the leadership framework. There are some very obvious differences in terms of labels of
leadership constructs as indicated by the within row groupings in Table 3.

CRITIQUE OF THE FRAMEWORKS

The true value of our efforts is to point out aspects of each of the frameworks that could
be improved. While each of the organizations included in this analysis is unique, we believe that
the nature and purposes of these organizations is similar enough that there should be great
similarities in how leadership is defined, described and displayed within them.

The first test we submitted the frameworks to was whether or not they used a consistent
representation of the labels of their components across all those components. Only the Air Force
and Army models passed this test. The Coast Guard, Navy, and ECQ frameworks mix processes
(decision making, influencing and negotiating, problem solving), functions (mentoring,
management and process improvement, financial management), and characteristics (health and
well being, flexibility, integrity and honesty). The Marine Corps principles and traits were more
difficult to evaluate, but one could argue that several traits are actually KSAs (decisiveness,
judgment, knowledge).

The second test was one of independence of components within a framework. The Coast
Guard framework includes performance appraisal and workforce management systems –
certainly related; and self awareness/learning and leadership theory (defined in terms of learning
about leadership). The Army framework includes mental and conceptual aspects on the attribute
and skill dimensions, respectively. There also appears to be some overlap among the twelve skill
dimensions (developing/building/improving; executing/operating). The Air Force framework
may potentially overlap on commanding and exercising sound judgment, and many of the other
identified components seem closely related to other components (inspiring trust and
influencing/negotiating; building relationships/mentoring). The Navy and ECQ frameworks had
similar overlap within them (problem solving/decisiveness; leading people/working with people).
Several Marine Corps principles and traits overlap (make sound and timely
decisions/decisiveness; seek responsibility and take responsibility for actions/initiative).

The most common confounding in the frameworks is the mixing of processes or


techniques to perform work and the functional areas of that work. For example, all organizations
include decision making, problem solving, or judgment at some level in their frameworks. With
the exception of the Army and Marine Corps, they also include functional areas such as
workforce management, financial management, and conflict management that obviously require
these processes or techniques to perform them.

Next we examined the extent to which each of the frameworks provide behavioral
examples or actions associated with the competency or components. As an illustration of the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
730

variety of definition and behavior content and detail, we provide information from each
competency framework relevant to the construct of decision making/decisiveness/sound
judgment in Table 4. The results indicate the different ways the services say the same thing.

Table 4
Competency Framework detail for the Construct of Decision Making/Decisiveness/Sound
Judgment

Source Competency Definition/Description Behaviors


Label
Air Force Exercising Developing and applying broad None found.
Sound knowledge and expertise in a disciplined
Judgment manner, when addressing complex
issues; identifying interrelationships
among issues and implications for other
parts of the Air Force; and taking all
critical information into account when
making decisions
Army Decision Involves selecting the line of action (Partial list of performance indicators)
Making intended to be followed as the one most Employ sound judgment and logical
favorable to the successful reasoning. Gather and analyze relevant
accomplishment of the mission. This information about changing situations to
involves using sound judgment, reasoning recognize and define emerging problems.
logically, and managing resources wisely. Make logical assumptions in the absence of
facts. Uncover critical issues to use as a
guide in both making decisions and taking
advantage of opportunities. Keep informed
about developments and policy changes
inside and outside the organization.
Recognize and generate innovative
solutions.

Coast Decision None found. Learn to identify and analyze problems under
Guard Making and normal and extreme conditions. Learn to
Problem consider and assess risks and alternatives.
Solving Use facts, input from systems, input from
others, and sound judgment to reach
conclusions. Learn to lead effectively in
crisis, keeping focus on key information and
decision points. Commit to action; be as
decisive as a situation demands. Involve
others in decisions that affect them.
Evaluate the impact of your decisions

ECQ Decisiveness Exercises good judgment by making (Embedded in example qualification and
sound and well-informed decisions; capability narratives)
perceives the impact and implications of
decisions; makes effective and timely
decisions, even when data is limited or
solutions produce unpleasant
consequences; is proactive and
achievement oriented.

Source Competency Definition/Description Behaviors


Label
Marine Decisiveness Decisiveness means that you are able to (Suggestion for improvement) Practice being
Corps make good decisions without delay. Get positive in your actions instead of acting half-
all the facts and weight them against each heartedly or changing your mind on an issue.
other. By acting calmly and quickly, you

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
731

should arrive at a sound decision. You


announce your decisions in a clear, firm,
professional manner.
Navy Decisiveness Exercises good judgment by making None found.
/Risk sound and well-informed decisions;
Management perceives the impact and implications of
decisions; makes effective and timely
decisions, even when data are limited or
solutions produce unpleasant
consequences; is proactive and
achievement oriented. (Identical to ECQ)

Competency models/frameworks are intended to establish what leaders should be or do to


achieve organizational goals. Decisiveness means little to leaders without accompanying
information about what decisiveness accomplishes, how it is enacted, and why it leads to
organizational goals. Most of the frameworks provide definitions of competencies and
components to further understanding. Simply defining decisiveness, much like defining
leadership, does little other than to provide an alternative set of words for the label. What is truly
valuable is the description of how decisiveness is manifested in the organization. The more
concrete and concise the description of actions and behavior associated with competencies, the
more likely these competencies will be accepted, understood, and demonstrated.

FINAL WORDS

The most important considerations in developing and establishing leadership


competencies should be how they will be used to influence leadership assessment, selection,
development, and performance management processes. Even the best framework of leadership
has no value if it is not used productively by that organization. Redundancy, missing
components, buzzwords, and inaccurate descriptions of effective behavior in doctrine are
insignificant if they are not used. Well developed, comprehensive, prescriptive models of
organizational leadership will be wasted unless leaders understand, embrace, and apply the
features of the framework/model and organizations integrate them into succession planning,
training and development, and multi-rater feedback systems.

Shippmann, et al. (2000) conducted a review of competency modeling procedures


compared with job analysis procedures. In general, competency modeling procedures were rated
as less rigorous than job analysis procedures. However, competency modeling was felt to
provide more direct information related to business goals and strategies. Competencies may also
be more appropriate for describing successful leadership behaviors in future terms. This could
be a critical factor for the organizations studied as future threats and environments remain
dynamic and uncertain. These strengths should be exploited by these organizations and not lost
on confusing framework structures, unexplained redundancy in components, and incomplete
examples of how competencies are manifested for success.

There are many sources for recommendations on how to implement or improve sound
competency modeling procedures (Cooper, 2000; Lucia and Lepsinger, 1999). We would like to
highlight a few of their suggestions based on our findings.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
732

1. Define leadership and establish the boundaries on what is and isn’t considered in
your organizations leadership framework.
2. Use a consistent representation of tasks, functions, actions and behaviors that
leaders perform.
3. Seek to eliminate redundancy in competencies and elements and clearly indicate
how actions and behaviors are linked to competencies or elements.
4. Involve behavioral scientists as well as leaders at all levels of the organization in
development and vetting of the model/framework.
5. Seek to validate competencies through organizational results.

We would also like to point out that some of the frameworks that we investigated are
undergoing change. We were not able to gather the pertinent information related to where each
service is in refining, updating, or extending their framework but we do know there are efforts
underway in the Army and Navy to modify their leadership frameworks and models.

Looking back to our elementary school student profile, perhaps we can take solace in the
recognition that our current students are our future leaders. Providing them with a roadmap for
student success serves to assist them in their development and gives us a method for tracking
their progress. Communicating the meaning of those competencies labeled in Table 1 will help
them determine how they should behave, and help the rest of us assess, develop, and reinforce
those behaviors. Reducing the redundancy, improving the detail, and providing behavioral
examples of the competencies will assist in this effort.

REFERENCES

Air Force Leadership Development Model. Retrieved October 6, 2003 from


http://leadership.au.af.mil/af/afldm.htm.

Army Leadership: Be, Know Do. (1999). Field Manual 22-100. Headquarters, Department of
the Army, Washington, DC.

Briscoe, J., & Hall, D. (1999). Grooming and picking leaders using competency frameworks: Do
they work? An alternative approach and new guidelines for practice. Organizational Dynamics,
28, 37-52.

Coast Guard Leadership Development Program. (1997). Commandant Instruction 5351.1.


United States Coast Guard, Washington, DC.

Cooper, K. (2000). Effective competency modeling and reporting: A step by step guide for
improving individual and organizational performance. AMACOM.

Executive Core Qualifications. Retrieved October 6, 2003 from


http://www.opm.gov/ses/handbook.htm.

Lucia, A., and Lepsinger, R. (1999). The art and science of competency models: Pinpointing
critical success factors in organizations, Vol. 1. Wiley, John & Sons.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
733

Marine Corps Leadership Principles and Traits. Retrieved October 6, 2003 from
http://www.infantryment.net/MarineCorpsLeadership.htm.

Navy Leadership Competencies. Retrieved October 6, 2003 from http://www.e-


reservist.net/SPRAG/Leadership_Competencies.htm.

Newsome, S., Catano, V., Day, A. (2003). Leader competencies: Proposing a research
framework. Research Paper prepared for the Canadian Forces Leadership Institute. Available
online at http://www.cda-acd.forces.gc.ca/cffi/engraph/research/pdf/50.pdf

Shippmann, J., Ash, R., Battista, M., Carr, L., Eyde, L., Heskety, B., Kehoe, J., Pearlman, K., &
Prien, E. (2000). The practice of competency modeling. Personnel Psychology, 53, 703-740.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
734

NEW DIRECTIONS IN FOREIGN LANGUAGE APTITUDE TESTING


Dr. John Lett, Mr. John Thain, Dr. Ward Keesling, Ms. Marzenna Krol
Defense Language Institute Foreign Language Center
597 Lawton Road, Suite 17
Monterey, CA 93944-5006
john.lett@monterey.army.mil

Military personnel are selected for foreign language training at the Defense Language
Institute Foreign Language Center (DLIFLC) via a two-tiered system.75 Those who pass
their service’s ASVAB composite for a language-requiring career field are permitted to take
the Defense Language Aptitude Battery (DLAB). Over the past thirty years, the DLAB has
proven to be a valid and reliable tool, contributing predictive variance over and above the
Armed Services Vocational Aptitude Battery (ASVAB). However, several factors converge
to stimulate efforts to reexamine the DLAB and possibly create a new one to replace it.
Factors include test security concerns (there is only one form), datedness (instructional
methods have changed since the test was developed in the early 1970s, and DLAB may not
predict as well as it could if tailored to new instructional environment), and customer demand
(we need more, and more highly proficient, language specialists than ever before.) In
response to these factors, DLIFLC has launched several initiatives. First, we created two
scrambled versions of the original test to protect against possible compromise of the original
version. Second, we have programmed the test for computer-based administration and have
initiated dialog with the accessions community regarding implementation thereof. Third, we
have launched a contractor-supported project to obtain the opinions of leading applied
linguists and cognitive psychologists regarding the advisability of exploring new approaches
and item types in the development of a new DLAB. In this paper we will discuss the
background factors alluded to above, describe the initiatives completed and presently under
way, and present the preliminary recommendations that have emerged from the most recent
project.

BACKGROUND

The Defense Language Aptitude Battery (DLAB) is an instrument primarily used, in


conjunction with the Armed Services Vocational Aptitude Battery (ASVAB), for selecting
and assigning military recruits to careers within military intelligence that require foreign
language skills. In order for individuals to enter a military intelligence career field, they must
first attain acceptable scores on the ASVAB. If that career field requires that a recruit be
trained in a foreign language, the individual must also attain an acceptable score on the
DLAB.

75
The views expressed in this document are those of the authors and do not necessarily reflect the views of the
Defense Language Institute Foreign Language Center or the Department of the Army.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
735

All potential military recruits must take the multi-aptitude test battery known as the Armed
Services Vocational Aptitude Battery (ASVAB), either in a high school setting, at a Military
Entrance Processing Station (MEPS), or at a Military Entrance Testing Site (METS). The
ASVAB contains eight individual tests: General Science, Arithmetic Reasoning, Word
Knowledge, Paragraph Comprehension, Mathematics Knowledge, Electronics Information,
Auto and Shop Information, and Mechanical Comprehension. Each ASVAB subtest is
timed, and the entire battery takes about three hours. Scores are reported for individual
subtests and for various combinations of subtests, known as composites.

ASVAB data are used for both accession and classification purposes, and each service has its
own preferred composite of ASVAB subtests that qualify recruits for further testing to
determine whether they can become military linguists. Recruits who qualify on their
service’s ASVAB composite may then take the DLAB, which is one of several special tests
administered in the MEPS. Whether the DLAB is actually administered to any particular
recruit who has the appropriate ASVAB composite score depends on several factors. First,
the potential recruit must be willing to take the DLAB. Second, classifiers in the MEPS may
offer the recruit a different post-ASVAB instrument if other jobs are more pressing at the
time. Third, time or other circumstances may prevent the DLAB from being administered to
the potential recruit.

When the DLAB is administered to a recruit, it, too, serves both selection and assignment
purposes. A recruit becomes eligible for assignment to language training at DLIFLC in
preparation for a career as a military linguist by scoring at or above his/her service’s DLAB
cut score. Eligibility to study a given language is determined by comparing the recruit’s
DLAB score to cut scores for the various language difficulty categories. Within those
constraints, the actual language assigned depends upon the needs of the service, and
sometimes upon the desires of the recruit. Regardless of the language studied, DLIFLC
students are expected to attain the Institute’s stated level of proficiency at the end of the basic
program.

The nature of the DLAB and the context in which it is used

The DLAB is a multiple-choice test that takes about two hours to administer. It consists of
four parts: a brief biographical inventory, a test of the ability to perceive and discriminate
among spoken stress patterns, a multi-part section which tests the ability to apply the
explicitly stated rules of an artificial language, and a final section which tests the ability to
infer linguistic patterns as illustrated by examples in an artificial language and to apply the
induced patterns to new artificial language samples.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
736

The learning environment

The DLAB was developed in the 1970s following the then-current approach to aptitude
testing, which tended to emphasize the strength of the empirical relationships between
various predictors and a criterion. Thus, the DLAB is not primarily an attempt to flesh out a
theoretical model of language learning aptitude. Nevertheless, any foreign language aptitude
test such as the DLAB contains an implicit view that there is an underlying construct of
“aptitude to learn a foreign language” that is over and above general intelligence needed for
academic learning.76

It should be stressed that the foreign language learning that the DLAB is to predict takes
place in an intensive classroom-based language-teaching environment in the U.S. This type
of learning environment is likely to be unfamiliar to most students because of its intensive
nature and its duration. Students are in class for six hours a day, five days a week, for up to
63 weeks. Authentic materials are used in the classroom as soon as possible and throughout
the whole course.

At the policy level, the DLIFLC espouses a communicative, proficiency-oriented approach to


language teaching and learning; students must be able to use their language to perform
specific kinds of tasks when they reach their post-DLIFLC job stations. In contrast, the
DLAB was developed in the immediate aftermath of the “audio-lingual” era, one in which
language teaching was heavily influenced by the habit-formation theories of behavioral
psychologists such as B. F. Skinner.

Criteria for success

The optimal definition of success at DLIFLC is that the student completes the course on time
and demonstrates the language proficiency required to receive a graduation diploma.
Proficiency is demonstrated via scores on the Defense Language Proficiency Test (DLPT).
The DLPT is administered at the end of the course of study at the DLIFLC and also is taken
annually by military linguists throughout their careers. It uses a multiple-choice format to
assess foreign language proficiency in listening and reading, and a face-to-face performance-
based interview to assess speaking proficiency. Scores on the DLPT are interpreted in terms
of the proficiency levels of the government’s Interagency Language Roundtable (ILR) Skill
Level Descriptions,77 which range from 0 to 5, where ‘5’ represents the proficiency of the
educated native speaker.78 To graduate, a DLIFLC student must demonstrate ILR
proficiency levels of at least 2 in listening and reading and at least 1+ in speaking, regardless
of the difficulty category of the language studied. It should be noted that the DLAB is not
intended to predict success in the job after language training. Its purpose is to predict

76
The development of the DLAB is described in Petersen & Al-Haik, (1976).
77
The complete text of the Descriptions and a synopsis of their history are available at http://govtilr.org/.
78
The DLPT listening and reading tests measure only to level 3; the oral proficiency interview can measure the
full range of the scale.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
737

successful completion of basic language training, where success is defined in terms of


satisfactory performance on the DLPT.

Why a new DLAB is needed

In general, data support the position that the current DLAB “works,” and that this has been
true for some time.

• Data from the Language Skill Change Project (LSCP), conducted in the mid-1980s to
early 1990s by the Army Research Institute (ARI) and DLIFLC in coordination with the
US Army Intelligence Center and School (USAICS), indicated that DLAB scores added
meaningfully to the prediction of language learning outcomes over and above that
contributed by ASVAB scores.79

• In a separate study conducted in the late 1980s (White & Park, 1987), ARI analyzed over
5,000 cases of ASVAB and DLAB data to explore the relationship between them. The
study did show that the ASVAB Scientific and Technical (ST) composite (the composite
used by the Army as a gateway to military intelligence career fields) and DLAB were
positively correlated (r = .51), and pointed out that raising the ST cut score would reduce
the number of DLAB testing hours required to obtain a given number of successful
DLAB scores. For example, among those with ST scores of 104 and below, only 1.3%
made a DLAB of 100 or more, compared with 46% for recruits scoring 130 or better on
the ST. However, the data also showed that high STs did not guarantee high DLABs:
even among those scoring 130 or more on ST, over half failed to reach DLAB 100, and
thus would not qualify for the more difficult languages.

• At the DLIFLC’s Annual Program Review, the Institute reports how many students reach
success as related to their DLAB scores; generally, the higher the DLAB score, the more
probable a successful outcome. Of course, high aptitude scores do not guarantee success
for any given student; failures can and do occur. However, the likelihood of failure is
greater among low-aptitude students than among high-aptitude students.

So why is the Institute contemplating the development of a new DLAB? There are several
reasons.

• There is only one form of the DLAB. Its complete compromise would be catastrophic,
and its age leads to speculation that it may have been partially compromised many times
already.
• There have been substantial changes in the philosophy and practice of foreign language
education between the 1960s-70s and the present time.

79
The LSCP, conducted by ARI and DLIFLC, tracked an entering pool of 1903 US Army students of Spanish,
German, Russian, and Korean from their arrival at DLIFLC until approximately the end of their first enlistment
period. The data referred to here are described in Lett & O’Mara, 1990.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
738

• The current standardized criterion measures (i.e., the DLPT) were not available when the
DLAB was developed.
• There are always desires to improve the efficiency of large training systems, and the need
to produce language specialists of higher proficiency than ever leads to greater
expectations for selection and assignment systems.
• Issues of face validity and flexibility lobby for a modernized, computer-based test.

RECENT AND CURRENT INITIATIVES

DLAB-scramble

In an effort to guard against one form of test compromise, DLIFLC produced two versions of
the existing test in which the order of items was carefully altered within test sections so as to
make the original answer key invalid. Care was taken to ensure that relevant parameters
remained constant, such as the relative difficulty of adjacent items and the time allowed
between adjacent items. All test materials were integrated into modern media; e.g., graphics
were scanned, text was retyped into Word for Windows files, and original recordings were
transferred to compact discs (CD). The result was two parallel forms, each containing the
original items but with answer keys that differed from each other and from the original.
These materials were made available to Army Personnel Testing (APT) in [date] and the
original DLAB was withdrawn from service.

Computer-delivered DLAB

One way to get more high-aptitude language students into the training pipeline would be to
administer the DLAB to larger numbers of students. One approach that has been proposed to
test more recruits is to use a computer-based DLAB. In order to address those issues, work
has proceeded along two fronts. One step was to establish liaison with MEPCOM to ask
whether the MEPS would be able to make use of a programmed DLAB if they had one.
Through the courtesy of the neighboring Defense Manpower Data Center (DMRC), we were
able to meet with MEPCOM representatives and others at the October 2002 meeting of the
Military Accessions Policy Working Group (MAPWG). Discussions led to an agreement
that the new computers being procured to move the computer-adaptive ASVAB to a
Windows platform would be allowed to have audio capability to enhance the possibility that
they could be used to administer a computerized DLAB when not being used for their
primary purpose. At the present time a study is being conducted by SY Coleman, Inc., to
identify in some detail the technological and infrastructure issues which must be addressed in
order for MEPS and Post-MEPS locations to administer an automated DLAB.

Meanwhile, we have taken steps to position ourselves for feasibility studies regarding the use
of computer-based DLABs in the MEPS. We took the electronic item files that had been
developed for the scrambled DLAB project and a DLIFLC programmer produced a working
model of a computer-based DLAB. This product has undergone extensive beta testing within
DLIFLC and is being revised per user feedback. It is now being used by SY Coleman to

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
739

investigate whether it or a similar product can or could be administered on the Win-CAT-


ASVAB workstations.

The “new mousetrap” project

Making use of FY 2002 funding provided by the US Navy, we launched the first portion of a
multi-phase effort to design, develop, validate, and field a new test to provide a replacement
for or an enhancement to the existing DLAB. The first step of the design phase was to
review recent research on the subject of language aptitude and to query experts in this field to
obtain insights and recommendations to inform decisions for revising or replacing the
DLAB. To this end, and with contractor assistance,80 we contacted several expert
consultants81 who familiarized themselves with our issues, prepared position papers, and
participated in a two-day workshop at DLIFLC in October 2003. The three overarching
general questions which we posed to the experts were these:

• What do theoretical developments in educational and cognitive psychology with regard to


classroom-based, second language learning imply for the improvement of the DLAB?
Are there new constructs that relate to acquisition of a second language that can be used
in a predictive battery?

• What do current theories and practices in assessing aptitude for foreign language learning
among adults, or for adult learning in general, imply for the improvement of the DLAB?
Have there been significant developments in conceptualizing the assessment of aptitude

80
The services of Perot Systems Government Services, then Soza & Co., Ltd., were obtained through the
auspices of the OPM Training Management Assistance program. Perot Systems engaged the Center for
Applied Linguistics (CAL) as the project’s subcontractor.
81
THE CONSULTANTS WERE WILLIAM J. STRICKLAND, VICE PRESIDENT,
HUMAN RESOURCES RESEARCH ORGANIZATION (HUMRRO); PETER J.
ROBINSON, PROFESSOR OF LINGUISTICS, DEPARTMENT OF ENGLISH,
AOYAMA GAKUIN UNIVERSITY, TOKYO, JAPAN; AND DANIEL J REED,
LANGUAGE ASSESSMENT SPECIALIST, PROGRAM IN TESOL AND APPLIED
LINGUISTICS, INDIANA UNIVERSITY. AN EXPERT PARTICIPANT FROM WITHIN
THE US GOVERNMENT WAS MADELINE E. EHRMAN, DIRECTOR, RESEARCH,
EVALUATION AND DEVELOPMENT, FOREIGN SERVICE INSTITUTE (FSI) AND
SUBJECT MATTER EXPERT, CENTER FOR THE ADVANCED STUDY OF
LANGUAGE (CASL). THE WORKSHOP WAS FACILITATED BY THREE PERSONS
FROM THE CENTER FOR APPLIED LINGUISTICS, WASHINGTON, DC: DORRY
KENYON, DIRECTOR, LANGUAGE TESTING DIVISION; DAVID MACGREGOR,
RESEARCH ASSISTANT; AND PAULA M. WINKE, TEST DEVELOPMENT
COORDINATOR, LANGUAGE TESTING DIVISION AND PH.D. CANDIDATE,
APPLIED LINGUISTICS, GEORGETOWN UNIVERSITY. SOME OF THE MATERIAL
IN THIS PAPER IS BASED ON MATERIALS DEVELOPED DURING THIS PROJECT.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
740

for foreign language learning since the era when the DLAB or the Modern Language
Aptitude Test (MLAT) were first developed?

• Could personnel selection and classification for language specialist career fields be
improved by relying more heavily on measures of general aptitude for learning rather
than seeking to refine measures of aptitude for learning specific kinds of things, such as
languages in general, or language families, or specific languages or language skills?
What mix of these approaches would yield the best predictions of success at DLIFLC?

At the conclusion of two days of presentations and small-group working sessions, the
participants were asked to synthesize their opinions and to indicate what kinds of item types
should be involved in a revised test, what existing tests or test parts should be retained, etc.
After the workshop, the expert consultants provided written summaries of their
recommendations, with justifications.

Preliminary synthesis of recommendations

The synthesis presented here is only preliminary because we want to have all of the materials
generated in the workshop reviewed by a prominent cognitive psychologist before we
consolidate them into a final set of recommendations. With that caveat, several key findings
can be stated.

• Most or all of the existing DLAB should be retained.


• Certain DLAB parts should be expanded or replaced by items like those on other existing
language aptitude tests.
• New subtests should be developed to measure constructs that are not now being
measured. Among others, these should include tests of perceptual speed, working
memory, phonological discrimination, and the ability to listen (to one’s native language)
under less than ideal acoustic conditions.
• Consideration should be given to a two-tiered approach to language aptitude assessment:
one to be given before arrival at DLIFLC and another to be given post-arrival. The
former would serve as a gatekeeper for language training in general, and the latter would
be used to make more informed assignments of recruits to languages or even to specific
kinds of instructional environments.
• We should investigate the validity of alternative scoring strategies for both the current
and the proposed system, e.g., by exploiting scores on DLAB parts in a manner similar to
the way ASVAB subsets are grouped into composites for particular screening purposes.
Similarly, we might consider a compensatory model for selection rather than today’s
“multiple hurdles” approach. Such a system might allow the minimum DLAB score to
vary based on appropriate ASVAB scores, or waive DLAB altogether for extremely high
scorers on certain ASVAB subtests or composites.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
741

NEXT STEPS

As a first priority, we will complete the “New Mousetrap” project currently under way and
will continue to explore infrastructure issues regarding the possible implementation of a
computer-based DLAB in the MEPS. Simultaneously, we will be designing proposals for
studies which should be done in the short term to address selected aspects of the various
recommendations that are synthesized above. We will also be developing a longer-range
research agenda and submitting proposals to appropriate research centers within the
Government.

REFERENCES

Lett, J. A., & O’Mara, F. E. (1990). Predictors of Success in an Intensive Foreign Language
Learning Context: Correlates of Language Learning at the Defense Language Institute
Foreign Language Center. In Stansfield, C. and Thomas A. Parry (Eds.), Language
Aptitude Reconsidered. Englewood Cliffs, NJ: Prentice Hall Regents.

Petersen, C.R. & Al-Haik, A.R. (1976). The development of the Defense Language
Aptitude Battery (DLAB). Educational and Psychological Measurement, 36, 369-380.

White, L. A., & Park, K. (1987). Selection and Classification Technical Area Working
Paper RS-WP-87-09. An examination of relationships between the Defense Language
Aptitude Battery and the Armed Services Vocational Aptitude Battery. Alexandria, VA:
U.S. Army Research Institute for the Behavioral and Social Sciences.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
742

THE STRUCTURE & ANTECEDENTS OF ORGANISATIONAL


COMMITMENT IN THE SINGAPORE ARMY.

Maj Don Willis

Applied Behavioural Sciences Department


Ministry of Defence, Singapore

Correspondence:
Applied Behavioral Sciences Department,
Tower B, #16-01,
Depot Road,
Singapore 109681.
Tel: 65-3731574
Fax: 65-3731577

Authors’ note:
The opinions expressed in this paper are those of the author and not the official position of the
Singapore Armed Forces or the Ministry of Defence, Singapore.

ABSTRACT

Using the Meyer and Allen’s (1991) 3-Component Model of Organisational


Commitment (Affective, Continuance and Normative) as the theoretical framework, this
study showed that commitment in the Singapore Army could be described in the 3
dimensions postulated by Meyer and Allen (1990). Using both Exploratory and
Confirmatory analysis of the 18-item scale developed by Meyer et al (1993), this study
increases the evidence for the applicability of this model to an Asian context. In addition,
‘Job Satisfaction/Meaningfulness’ was found to be the most important predictor of all 3
approaches to commitment. ‘Relationship with peers and superiors’ was also a significant
predictor of Affective Commitment, while ‘Promotion opportunities’ and ‘Support provided
by the organisation’ predicted Normative Commitment. The above-mentioned findings,
while providing cross-cultural support, also extended the applicability to a military culture, in
spite of the latter’s collectivistic nature (Kagitçibasi, 1997; Triandis, 1995), where hierarchy,
control, orderliness, teamwork, group loyalty and collective goals are valued over equality,
individual expression and self-interests (Soh, 2000).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
743

INTRODUCTION
This study investigated the structure and antecedents of employee commitment in
the Singapore Army. In the modern battlefield, technology has been a proven force
multiplier. However, the employment of the best weapons’ systems can only be optimized in
the hands of talented and committed individuals. An understanding of the structure and
antecedents of Organizational Commitment in the Army would provide a source of insight
for the formulation of strategic career development as well as recruitment and retention
policies. Not only will this provide the Army with a potential edge in the highly competitive
Singapore labour market, but will also assist in developing processes that imbue commitment,
something paramount to an organization that has been entrusted with the sacred responsibility
of the country’s defence.
METHOD
A cross-sectional survey design was employed. Data was collected via self-
administered questionnaires. 15 Battalions were randomly selected for the survey. A total
of 621 regular Army personnel completed and returned the questionnaire. Organizational
Commitment was measured using the revised 18-item scale developed by Meyer et al (1993).
This scale has been used extensively and a review by Allen and Meyer (1996) of the evidence
relevant to the reliability and construct validity of the scale provided strong support for its use
in substantive research. A second section comprised 48 items pertaining to employees’
perception of various aspects of working in an organization, e.g. work relations, work
environment, rewards, etc. These items were derived and used by Lim (2001) in his study of
organizational commitment in a Singapore Public Sector undertaking. The generic nature of
the items made them applicable to the Army and hence was used in this study as possible
antecedents of Organizational Commitment as postulated by Meyer et al (1993).

ANALYSES & RESULTS


Exploratory Factor Structure of the 18-item Scale

To ascertain the construct validity of the Meyer and Allen scale, Exploratory Factor
Analysis using SPSS 10.0 for Windows was conducted on the 18 items. Principal axis
factoring extraction and Varimax rotation were used to mirror the approach adopted by
Meyer et al, (1993). An examination of the scree-plot suggested that 3 factors might be

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
744

retained for further analysis. Consistent with the scree-plot, the un-rotated factor solution
produced 3 factors with eigenvalues above 1 that accounted for almost 54% of the variance.
Eventually, following the scale reliability analysis, a 3-factor structure was obtained as
appended in Table 1 below:

Table 1. 3-Factor Solution Based on Exploratory Factor Analysis of the Revised 18-item
Scale (Principal Axis Factor Extraction with Varimax Rotation) following Scale Reliability
Analysis (Cronbach Alpha)
Factor I (Alpha = .8376) Factor II (Alpha = .8527) Factor III (Alpha = .7280)

AC1 NC10 CC2


AC3 NC11 CC3
AC4 NC12 CC5
AC5 NC13 CC6
AC7 NC14 CC7
AC8

It can be seen that the first factor comprised all of the 6 Affective Commitment (AC)
items as postulated by Meyer et al (1993) Factor II comprised 5 of the 6 Normative
Commitment (NC) items while Factor III comprised 5 of the 6 Continuance Commitment
(CC) items. By virtue of the orthogonal rotation, the results therefore indicated the presence
of 3 distinguishable factors, generally replicating the factor structure of the 18-item scale.
Given this, it became pertinent to carry out Confirmatory Factor Analysis to assess if the data
did indeed fit the 3-factor model as depicted by Meyer et al (1993) in the revised18-item
scale.
Confirmatory Factor Analysis of the 18-item Scales

The covariance matrices derived from the data were used as the inputs and a
maximum likelihood solution was obtained using LISREL 8.30 (Jöreskog & Sörbom, 1993).
The factor for which each item was proposed to load on was based on the loadings used by
Meyer et al (1993). Following the recommendations of Bollen and Long (1993), multiple
indices of fit were used to evaluate the models’ fit to the data. Table 2 below summarises the
findings.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
745

Table 2. Summary Goodness of Fit indices and χ2 tests of the 18-item Commitment Scale
Statistic Desired Fit Indices Obtained Fit
Indices
χ2 - 362.18 (132),
p=0.00
Root Mean Square Error of Approximation < 0.05 (Good) 0.066
(RMSEA)
0.05-0.08
(Reasonable)
Standardised Root Mean Square Residual < 0.05 (Acceptable) 0.065
(RMR)
Goodness of Fit Index (GFI) >0.90 (Acceptable) 0.91
Comparative Fit Index (CFI) >0.90 (Acceptable) 0.92

The above-mentioned indices were selected as recommended by Diamantopoulos and


Siguaw (2000) to make an informed decision. It is noted that the chi-sq value was significant
thus suggesting a poor fit. However, this was to be expected, as chi-sq is sensitive to samples
that exceed 200. Based on the other indices, it can be concluded that the data from the 18
items produced an acceptable fit to the model.

Exploratory Factor Structure of the 48-item Work Variables

To facilitate the meaningful interpretation of the work variables surveyed, an


exploratory factor analysis was conducted on the 48 items. The work items retained after the
EFA and Scaled Reliability Analysis were reduced from 48 to 45 items, which loaded onto 8
factors. These, together with the summary statistics obtained for each of these factors are
described in Table 3 below. Given their acceptable Cronbach Alpha coefficients, they were
subsequently used as the predictor variables for the 3 Commitment scales.

Table 3. 8-Factor Solution based on Exploratory Factor Analysis (Principal Axis Factor
Extraction,Varimax Rotation) of 48-item Work Variables following Scale Reliability
Analysis

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
746

FACTO ALPH DESCRIPTION MEA SD*


R A N*
Factor I .8829 Relationship with peers and superiors 1.26
5.0879
34
Factor II .8896 Satisfaction with financial re-numeration 1.36
3.6566
28
Factor III .8687 Career and personal development opportunities 1.25
4.2537
34
Factor IV .8836 Perception and satisfaction with issues pertaining to
1.25
3.6719
promotions and advancements 74
Factor V .8813 Meaningfulness of and satisfaction with the job 1.34
5.2620
25
Factor VI .8876 Support provided by the organisation 1.53
4.1999
51
Factor .7720 Workload
1.20
4.9660
VII 22
Factor .7334 Comparisons with the Private Sector
1.25
4.1574
VIII 15
* Factor scores were computed based on the arithmetic means of the items that loaded on the
respective factor.

Relationship Between Commitment & Work Variables

The next step involved using Stepwise Multiple Regression to analyse the relationship
between the work factors and each Organisational Commitment dimension. All the 8 work
factors were regressed onto each of the Organisational Commitment dimension in turn. The
order of the variables entered into the regression equation each time was determined by the
strength of the correlation between each variable and the Organisational Commitment
dimension concerned.

‘Meaningfulness/Satisfaction with job’ and ‘Relationship with peers and superiors’


were found to predict AC significantly, accounting for about 48% of the variance. On the
other hand, only ‘Meaningfulness/Satisfaction with job’ predicted CC, accounting for only
7% of the variance. There were 3 predictors of NC; ‘Meaningfulness/ satisfaction with the
job’, ‘Promotion opportunities’ and ‘Support provided by the organisation’. Together, these
accounted for almost 52% of the variance.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
747

DISCUSSION

Analysis of the 18-item Scale

The factor structure obtained via EFA generally replicated the factor structure of the
18-item scale. Due to the orthogonal nature of the rotation, the results therefore indicated the
presence of 3 distinguishable factors. The Cronbach Alpha-coefficients obtained were also
consistent with those reported by Meyer et al (1993). Using CFA, the reasonable fit of the
data to the 18-item model confirms the findings of Meyer et al (1993) and Dunham et al
(1994). Given this, the results not only provide support for the 3-dimensional construct
definition of Organisational Commitment, but also on its cross-cultural applicability to an
Asian as well as a Military sample.

Organisational Variables & Work Variables

Only Meaningfulness/Satisfaction with Job and Relationship with peers and superiors
were found to predict AC significantly, accounting for about 48% of the variance. A
comparison of the standardised betas showed that the former is a more important factor
(about 4.5 times) than latter in predicting Affective Commitment. This finding is consistent
with those of Allen & Meyer (1993, 1997) & Cramer (1996) who found that employees with
a high level of emotional attachment to the organisation are also more likely to find their
work in the organisation meaningful and relevant, and at the same time enjoying a positive
relationship with their superiors and peers.

Only Meaningfulness/Satisfaction with Job predicted this form of Organisational


Commitment accounting for only 7% of the variance, an indication that there are other factors
not tested by the model. This finding is consistent with Meyer and Allen’s (1996) postulation
that the antecedents of CC include job status and benefits accruing from long years in service,
retirement benefits, opportunities for employment elsewhere, as well as the perceived
transferability of work skills; factors that were not part of our work variables.
‘Meaningfulness/Satisfaction with the job’, ‘Promotion opportunities’ and ‘Support

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
748

provided by the organisation’ predicted NC, accounting for almost 52% of the variance. The
standardised betas showed that ‘Meaningfulness/Satisfaction with the job’ is a more
important predictor than the other 2 variables. This finding is consistent with Allen and
Meyer’s (1996) postulation that NC may arise from experiences which make the employee
feel that their organisation is providing them with more than they can reciprocate thus
obliging them to continue membership with their organisation.

CONCLUSION
The present research, while providing cross-cultural support for the 3-factor model,
also extended the applicability of the scale to a military context, in spite of the latter’s
collectivistic culture (Kagitçibasi, 1997; Triandis, 1995), where hierarchy, control,
orderliness, teamwork, group loyalty and collective goals are valued over equality, individual
expression and self-interests (Soh, 2000), thus providing strong evidence for the model’s
generalisability across occupation groups. The findings, although exploratory in nature, is
encouraging. Future work will focus on determining other antecedents, especially for CC, as
well as the nature and components of these antecedents. These will provide the necessary
framework to serve as a platform for modeling and more robust confirmatory testing so as to
better help the Army appreciate and develop processes that imbue commitment.
REFERENCES

Bollen, K. & Long, J.S. (1993). Testing structural equation models. Newbury Park,
CA: Sage.
Cramer, D. (1996). Job satisfaction and organizational continuance commitment: A
two-wave panel study. Journal of Organizational Behaviour, 17, 389-400.
Diamantopoulos, A. & Siguaw, J.A. (2000). Introducing LISREL. London: Sage
Publications.
Jöreskog, K., & Sörbom, D. (1993). LISREL 8: Structural equation modeling with the
SIMPLIS command language. Chicago, IL: Scientific Software.
Kagitçibasi, C. (1997). Individualism and collectivism. In J. W. Berry, M. H. Segall,
& C. Kagitçibasi (Eds.), Handbook of cross-cultural psychology: Social and behavior
applications (2nd Edition), 3, 1-49. Needham Heights, MA: Allyn & Bacon.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
749

Lim, B. (2001). The Structure and Nature of Organisational Commitment in a


Singaporean Work Context. Unpublished doctoral dissertation, University of Manchester
Institute of Science & Technology.
Meyer, J.P., Allen, N.J, & Smith, C.A. (1993). Commitment to organizations and
occupations: Extension and test of a three-component conceptualization. Journal of Applied
Psychology, 78 (4), 538-551.
Soh, S. (2000). Organizational Socialization of Newcomers: A Longitudinal Study of
Organizational Enculturation Processes And Outcomes. Manuscript submitted for
publication.
Triandis, H. C. (1995). Individualism and collectivism. Boulder, CO: Westview Press.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
750

FURTHER UNDERSTANDING OF ATTITUDES TOWARDS NATIONAL


DEFENCE AND MILITARY SERVICE IN SINGAPORE

Charissa Tan, Lt-Col Star Soh, Ph.D, and Major Beng Chong, Lim, Ph.D
Applied Behavioural Sciences Department, Ministry of Defence
Defence Technology Tower B, 5 Depot Road,
#16-01, Singapore 109681
charissa@starnet.gov.sg

Authors’ note:
The opinions expressed in this paper are those of the authors and are not the official
position of the Singapore Armed Forces or the Ministry of Defence, Singapore.
______________________________________________________________________________

ABSTRACT

As the Singapore Armed Forces is a citizen’s army, it is important to understand the


attitudes of active and reserve servicemen towards national defence and military service. The
present study examines a model of inter-relationships among six constructs and across three
samples – full-time national servicemen, Regulars, and Reservists. The constructs of interest
were support for National Service, Commitment to Defend the Country, Sense of Belonging,
Perceived Security of the Country, Defensibility of the Country, and Confidence in the Armed
Forces. Face-to-face interviews were conducted from February to April 2003 with a random
selection of about 400 NSFs, 400 Regulars, and 400 Reservists. The hypothesized relationships
among the constructs were tested using structural equation modeling. The findings are presented
and discussed.

INTRODUCTION

In countries with conscript armed forces, citizens’ support for military service is crucial.
Without adequate support, the armed forces cannot build and develop itself into a formidable
force that can be entrusted to fulfill its functions. This study focuses on understanding attitudes
toward national defence and support for military service among full-time conscripts, Regular
servicemen, and reserves (National Servicemen).

When Singapore gained independence from Malaysia in 1965, the Singapore government
needed to quickly build a strong military force to provide Singapore with the foundation for
nation building (Huxley, 2000). In March 1967, Parliament passed the National Service
(Amendment) Bill to require every male citizen of Singapore to perform full-time military
service. Today, most Singaporean males enter into full-time national service (NSF) between age
eighteen and twenty for two and a half years. Upon completion of NSF service, the males
become operationally-ready national servicemen (NSmen) which form the reserve force. Most

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
751

NSmen’s training cycle last for thirteen years. NSmen are usually ‘called-up’ for military service
up to a maximum of 40 days per year. The NSFs and Regulars together form a standing Army of
about 50,000, with the ability to mobilize approximately 300,000 NSmen reserves.

National service is, therefore, a necessary part of every Singapore male’s life, as well as
an integral part of daily life for all citizens in Singapore. Given that the sovereignty and progress
of the nation depend on the security provided by the system of national service, the study of
military servicemen’s attitudes towards national service and willingness to fight for the country
becomes imperative.

In this study, the key constructs of Support for National Service and Commitment to
National Defence and their inter-relationships with various key antecedent variables were
examined in a six-factor model. This model is based on a similar study by Soh, Tan, & Ong
(2002) which found that among a sample of military servicemen, Support for National Service
and Commitment to National Defence were strongly related to Sense of Belonging to the country
and that the model was found to fit the data from another sample of the public in a cross-
validation analysis. However, it is noted that the sample of military servicemen comprised
NSFs, Regulars and NSmen combined together and could have missed out important differences
attributed to the different type of service. Also, the paper also recognized that one of the
constructs had been poorly operationalised.

The present study uses similar constructs to the 2002 study. The definitions of the
constructs are as follows:

Support for National Service (SPNS): The favorable or positive attitude toward military
conscription as a policy as well as a worthwhile investment of personal time and
resources.

Commitment to National Defence (CMND): The willingness to risk one’s life to fight
for the country in times of war.

Sense of Belonging (SOB): The willingness to stay in the country under general contexts
(i.e., war scenario not specified) and the feelings of national pride and belonging to the
country.

Perceived Security (PS): The confidence that the country will enjoy peace and stability
and will prosper over the short term (i.e., next five years).

Defensibility of the Country (DS): The belief and confidence that the country can be
defended.

Confidence in the Armed Forces (SAF): The confidence that the Armed Forces has the
capability to defend the country.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
752

The following relationships, based largely on the 2002 study, were hypothesized (see
Figure 1):

H1: Commitment to National Defence will be strongly and positively related to Support
for National Service. Should one be willing to fight for the country, one would support
national service as a means to be involved in or show one’s support for national defence.

H2: Sense of Belonging will be strongly and positively related to Commitment to


National Defence. One would be committed to fight for a country that one has a strong
feeling of national pride and sense of belonging to the country.

H3: Perceived Security of Singapore and Defensibility of Singapore will be weakly and
positively related to one’s Sense of Belonging. One’s sense of belonging is not purely
affect-based, but also has a cognitive evaluative component. It is proposed that knowing
the country is safe and that the country can be defended enhances one’s sense of
belonging to that country.

H4: Defensibility of Singapore will be weakly and positively related to Support for
National Service. One would be more willing to support national service if one believes
and is confident that the country can be defended and that one’s efforts are worthwhile
towards this end.

H5: Confidence in the SAF will be weakly and positively related to the Perceived future
Security of the Country, and moderately and positively related to the Defensibility of
Singapore. Having the confidence in a strong armed forces should lead to a greater
sense of security and the perception and belief that Singapore can be defended. However,
Confidence in the Armed Forces is expected to have a stronger relationship with the
Defensibility of Singapore as the contexts of invasion and defence are more salient
compared to Perceived Security, which is a more general outlook of peace and prosperity
over the next five years.

Figure 1. Hypothesized Model of Inter-relationships

Perceived
Security of
+ Singapore +

++
Confidence Sense of ++ Commitment Support for
in Armed Belonging to National National
Forces Defence Service

++ +
Defensibility
of the +
Country + Weak, positive relationship
++ Moderate-Strong relationship
45th Annual Conference of the International Military Testing Association
Pensacola, Florida, 3-6 November 2003
753

To test the robustness of the hypothesized relationships, the findings from the sample of
Full-time National Servicemen (NSFs) were cross-validated across a sample of Regulars and a
third separate sample of NSmen (the reserves).

METHOD

Sample
Data were obtained from three samples: 392 NSFs, 402 Regulars, and 494 NSmen from
the Singapore Armed Forces. Proportionate random sampling was used to select the participants
for the survey so as to ensure that the sample was representative by type of service (Regulars,
NSFs, NSmen), rank, and service (Army, Navy, Airforce). The sample sizes yield a precision
that met the statistical criterion of ± 5% error margin at the 95% confidence level.

Procedure
The data for this study were extracted from an annual survey of the Regulars, NSFs, and
NSmen from the Singapore Armed Forces on their perceptions and attitudes toward various
defence-related issues. Data were collected using face-to-face interviews from February to April
2003.

Measures
All the items in the survey instrument were self-developed. There were at least 2 items
measuring each construct, resulting in a total of 21 items for analysis. The constructs, example
survey items, and response options are presented in Table 1 below:

Table 1. List of constructs, example questions and response options.


Construct Items Response Options
1. Support for 1. National Service is necessary for the
NS defence of Singapore. Strongly Disagree, Disagree,
2. NS provides the security needed for Agree, Strongly Agree
Singapore to develop and prosper.
2. Commitment 1. If war should come, I would risk my life
to National to fight for Singapore. Strongly Disagree, Disagree,
Defence 2. (R) If war should come to Singapore, I Agree, Strongly Agree
would try to leave the country.
3. Sense of 1. I am proud to be a Singaporean. Strongly Disagree, Disagree,
Belonging 2. Singapore is where I belong. Agree, Strongly Agree
4. Perceived 1. I am confident that Singapore will enjoy
Security peace and stability over the next five years. Strongly Disagree, Disagree,
2. I am confident that Singapore will Agree, Strongly Agree
prosper over the next five years.
5. Defensibility 1. Singapore has enough resources to Strongly Disagree, Disagree,
of Singapore defend itself. Agree, Strongly Agree

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
754

Construct Items Response Options


2. Singapore can be defended even if no Strongly Disagree, Disagree,
country helps us. Agree, Strongly Agree
6. Confidence in 1. If there is a war now, I am confident that
Strongly Disagree, Disagree,
Armed Forces the SAF will have a quick and decisive
Agree, Strongly Agree
victory.
Note. (R) indicates that the responses have been reversed-scored.

RESULTS

Scale reliabilities and inter-correlations. The scale reliabilities (internal consistency Cronbach
Alpha) and inter-correlations for the scales are presented in Tables 2.1 to 2.3 below:

Table 2.1. Scale Reliabilities and Inter-Correlations for the NSF sample (N=392)

NSF Sample (N=392)


Cronbach
Scales (No. of items in parentheses) 1 2 3 4 5 6
Alpha
1. Support for NS (5) .80 1.00
2. Commitment to National Defence (4) .86 .61 1.00
3. Sense of Belonging (4) .74 .58 .61 1.00
4. Perceived Security (2) .67 .25 .23 .34 1.00
5. Defensibility of the Country (3) .73 .40 .37 .34 .22 1.00
6. Confidence in the Armed Forces (3) .74 .48 .44 .44 .36 .70 1.00
Note. p <0.5 for all correlations.

Table 2.2. Scale Reliabilities and Inter-Correlations for the Regulars sample (N=402)

Regulars Sample (N=402)


Cronbach
Scales 1 2 3 4 5 6
Alpha
1. Support for NS .80 1.00
2. Commitment to National Defence .83 .66 1.00
3. Sense of Belonging .68 .57 .62 1.00
4. Perceived Security .71 .30 .36 .37 1.00
5. Defensibility of the Country .74 .45 .43 .37 .40 1.00
6. Confidence in the Armed Forces .75 .56 .48 .46 .44 .74 1.00
Note. p <0.5 for all correlations.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
755

Table 2.3. Scale Reliabilities and Inter-Correlations for the NSmen sample (N=494)

NSmen Sample (N=494)


Cronbach
Scales 1 2 3 4 5 6
Alpha
1. Support for NS .80 1.00
2. Commitment to National Defence .86 .66 1.00
3. Sense of Belonging .79 .62 .64 1.00
4. Perceived Security .67 .33 .30 .41 1.00
5. Defensibility of the Country .72 .38 .35 .34 .31 1.00
6. Confidence in the Armed Forces .76 .49 .45 .51 .40 .68 1.00
Note. p <0.5 for all correlations.

Models’ goodness of fit.

To test the hypothesized model of relationships, the data were analysed using covariance
structure modeling with LISREL 8.3 (Jöreskog & Sörbom, 1993) and used maximum likelihood
estimation. To test the model’s robustness, a series of nested model testing was carried out with
increasing constraints on the parameters (Meredith, 1993; Reise, Widaman, & Pugh, 1993)
across the NSFs, Regulars, and NSmen samples. First, we tested for configural structure
invariance for the six scales across the 3 samples. Second, we tested for measurement
equivalence across the three samples. Measurement equivalence is generally considered to exist
when there is invariance of factor loadings across samples. This is a necessary and important
step before comparing the relationships of constructs across samples. Once measurement
equivalence is achieved, the third step was to examine whether the hypothesized pattern of
relationships holds across the three samples. Last, we examined whether the strengths of the
relationships were equivalent across the samples (strong factorial invariance). The goodness of
fit indices of the various models as well as the corresponding changes to model fit are presented
in Table 3 below:

Table 3. Goodness of Fit Indices and χ2 Values for Various Models

Models df χ2 ∆χ2 RMSEA SRMR GFI NNFI CFI


M1: CFA: Configural
522 1289.28 - 0.059 0.049 0.92 0.92 0.93
invariance (6 factors)
M2: CFA: Measurement
equivalence (6 factors 552 1321.14 - 0.057 0.049 0.91 0.92 0.93
LX=IN)
M2-M1 30 - 31.86 ns
M3: Models with configural
546 1413.21 - 0.061 0.055 0.91 0.91 0.92
invariance
M4: Models with
measurement invariance 576 1448.27 - 0.060 0.056 0.91 0.92 0.92
LY=IN LX=IN
M4-M3 30 - 35.06 ns

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
756

Models df χ2 ∆χ2 RMSEA SRMR GFI NNFI CFI


M5: Strong Factorial
invariance (All LY,LX, BE 590 1474.50 - 0.060 0.062 0.90 0.92 0.92
& GA paths invariant)
M5-M4 14 - 26.23 *
M6 (All LY,LX,GA& BE
paths invariant, except for 588 1467.05 -
BE(2,3) kept free
M6-M4 12 - 18.78 ns 0.060 0.066 0.90 0.92 0.92
Note abbreviations.
* ∆χ2 is significant at p=0.05.
ns
∆χ2 is not significant at p=0.05.

As shown in Table 3 above, full measurement equivalence was obtained across the three
samples (∆χ2(30) = 31.86 for M2-M1 was not significant). The results also indicate that the basic
structural model was applicable across the three samples (∆χ2 (30) =35.06 for M4-M3 was not
significant), hence suggesting weak factorial equivalence among the three samples. However, it
was too restrictive to constrain all the paths to be equal (∆χ2 (14) = 26.23 for M5-M4 was
significant) and hence one path (ß(2,3) had to be freely estimated across the three samples so that
the structural model was applicable across the three samples (∆χ2 (12) = 18.78 for M6-M4 was
not significant). The final models for the three samples are illustrated in Figures 2, 3, and 4
below, with their respective fit indices presented in Table 4. Fit indices of root mean square
error of approximation (RMSEA=.060), standardized root mean square residual (SRMR=.066),
goodness of fit index (GFI=.90), non-normed fit index (NNFI=.92), and comparative fit index
(CFI=.94) all indicate acceptable fit by conventional standards.

Figure 2: Model for NSFs


R2=0.16

Perceived
Security of
Singapore
*0.40 *0.33 R2=0.34 R2=0.58 R2=0.62

Confidence Sense of Commitment *0.64 Support for


0.76
in Armed Belonging to National National
Forces Defence Service

*0.96 Defensibility *0.37


of the *0.27
Country

R2=0.91

Note. * Asterisks indicate paths that could be constrained to be equal across the models.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
757

Figure 3: Model for Regulars

R2=0.33

Perceived
Security of
Singapore *0.29
*0.57 R2=0.45 R2=0.74 R2=0.74
*0.67
Confidence Sense of 0.86 Commitment Support for
in Armed Belonging to National National
Forces Defence Service

*1.00
Defensibility *0.49
of the *0.29
Country

R2=1.00

Figure 4: Model for NSmen

R2=0.26

Perceived
Security of
Singapore
*0.51 *0.33 R2=0.42 R2=0.65 R2=0.69

0.80 *0.70
Confidence Sense of Commitment Support for
in Armed Belonging to National National
Forces Defence Service

*0.98 Defensibility *0.42


of the
Country *0.22

R2=0.96
* Asterisks indicate paths that could be constrained to be equal across the models.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
758

Table 4. Fit indices of the 3 models

Models df χ2 RMSEA SRMR GFI NNFI CFI


Model for NSF 182 475.23 0.061 0.063 0.90 0.90 0.92
Model for Regulars 182 414.36 0.059 0.050 0.91 0.92 0.93
Model for NSmen 182 535.48 0.063 0.055 0.91 0.91 0.92

DISCUSSION

The findings obtained supported the hypothesized relationships. In each of the groups,
NSF, Regulars and NSmen groups, the respondents’ commitment to defend Singapore was
strongly and positively related to their support for National Service. This suggests that the
respondents could have perceived National Service as a tangible means to express their
commitment to the country’s defence. Commitment to defend Singapore in times of war was in
turn strongly and positively related to one’s sense of belonging, indicating that one who was
emotionally attached to the country would be committed to defending the country in times of
war.

In all groups, sense of belonging was also found to be moderately and positively related
to perceived security and defensibility of the country, with defensibility of the country being
more strongly related as compared to perceived security. This reinforces the importance for a
strong perception that the country is defensible, as sense of belonging seems not only to be
affect-based, but also to be cognitively-based. As hypothesized, there was a weak and positive
relationship between defensibility of the country and support for National Service. Using
Vroom’s (1964) expectancy theory of motivation, these findings suggest that one’s support for
the benefits of military conscription and being involved tends to be pragmatic, whereby support
for national service is positively influenced by the perception that the country can be defended
and their efforts would not be in vain.

In all groups, confidence in the SAF was directly related to the perceived security of the
country and the defensibility of the country. The stronger correlation between confidence in the
armed forces and defensibility of the country as compared to perceived security of the country
suggests that, given their personal involvement in the defence of the country, the servicemen
could have been more likely to perceive the armed forces as having a greater role in the security
and defence of Singapore.

In conclusion, this study reinforces the importance of building on one’s sense of


belonging to the country and perception of security and defensibility, so as to achieve the
population’s commitment to national defence and support for national service.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
759

REFERENCES

Huxley, T. (2000). Defending the lion city: The Armed Forces of Singapore. NSW,
Australia: Allen & Urwin.

Jöreskog, K & Sörbom, D (1993). LISREL 8: Structural equation modeling with the
SIMPLIS command language. Chicago, IL: Scientific Software.

Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance.


Psychometrika, 58, 525-543.

Reise, S.P., Widaman, K.R., & Pugh, R.H. (1993). Confirmatory factor analysis and item
response theory: Two approaches for exploring measurement invariance. Psychological Bulletin,
114, 552-566.

Soh, S., Tan, C., & Ong, K.C. (2002). Attitudes towards national defence and military
service in Singapore. The 44th International Military Testing Association Conference. Ottawa,
Canada.

Vroom, V.H. (1964). Work and Motivation. New York: Wiley.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
760

MEASURING MILITARY PROFESSIONALISM


Lieutenant-Colonel Peter Bradley, Dr Danielle Charbonneau,
Lieutenant (Navy) Sarah Campbell
Military Psychology and Leadership Department, Royal Military College of Canada,
P.O. Box 17000 Station Forces, Kingston, Ontario, Canada K7K 7B4
Email: charbonneau-d@rmc.ca

This paper reports our progress in developing a measure of military professionalism. We


begin with the Huntington (1957) model, the traditionally accepted model of military
professionalism in North America. The essence of Huntington’s model is that the military
officer corps is a profession like the medical or legal profession because it embodies the three
professional criteria of expertise, responsibility and corporateness. Where Huntington focuses
on the organizational level, “analyzing the character of the modern officer corps” (p. 24), we
investigate professionalism at the individual level of analysis, by examining the professional
attitudes of individual officers, noncommissioned officers and soldiers.

Expertise. Huntington viewed expertise as specialized knowledge held by the


professional practitioner and gained through extensive study of the profession. We expand on
the Huntington definition of expertise in the present study to also include the continuous
upgrading of this specialized knowledge.

Responsibility. Huntington conceptualized responsibility as social responsibility,


reflecting the extent to which the professional organization provides a service essential to
society. Also included in Huntington’s definition of responsibility is the requirement for the
profession to regulate its members by enforcing professional codes of ethics and the need for the
individual professional to be intrinsically motivated by “love of his craft” and committed to the
state by a “sense of social obligation to utilize this craft for the benefit of society” (p. 31).

Corporateness. Central to Huntington’s definition of corporateness is the “sense of


organic unity and consciousness of themselves [i.e., the professionals] as a group apart from
laymen” (p. 26). Huntington refers to corporate structures like schools, associations and journals
which serve to develop and regulate the conduct of military professionals. Here it seems that
Huntington permits conceptual overlap between responsibility and corporateness, as his
definition of each concept makes reference to professional standards and ethics.

The Canadian Forces doctrinal manual on military professionalism (entitled Duty with
Honour) emphasizes the pride of the military professional in serving Canada’s citizens and
institutions. A central element of professionalism outlined in Canadian Army doctrine on
professionalism is the obligation of soldiers “to carry out duties and tasks without regard to fear
or danger, and ultimately, to be willing to risk their lives” (Canada’s Army, p. 33). For these
reasons, we included measures of national pride and risk acceptance in our professional measure.
_____________________________________________________________________________

The opinions in this paper are those of the authors and not the Department of National Defence.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
761

Another important influence on our study of professionalism is the work of Hall (1968)
who developed a measure of professional attitudes encompassing 5 dimensions. His first
dimension, use of the professional organization as a major referent, reflects the extent to which
the individual is influenced by the values, beliefs and identity of the organization. This
dimension may not have a counterpart in the Huntington model. Hall’s second dimension, belief
in public service, reflects a commitment to benefit society and is therefore similar to
Huntington’s responsibility. His third dimension, belief in self-regulation, reflects endorsement
of the idea that only other professionals are qualified to evaluate their performance. This is
similar to aspects of Huntington’s corporateness. Hall’s fourth dimension, sense of calling,
reflects a strong level of intrinsic motivation and is akin to Huntington’s responsibility. Hall’s
fifth dimension, autonomy refers to the professional’s freedom to make decisions about his/her
work without external pressures, and is unlike any of Huntington’s professional criteria.

An important part of developing a measure of professional attitudes is to examine the


relations between professionalism dimensions and important organizational outcomes. We
hypothesized that our professionalism scales should be related to attitudinal outcome measures
like organizational citizenship behaviour (OCB), satisfaction, and commitment to the Army.

Method

Overview
We developed 5 professionalism scales of 43 items by adapting items from Hall’s (1968)
professional inventory and Snizek’s (1972) evaluation of the Hall measure and by writing
additional items for the dimensions described below. All items and scales were measured on a 5-
point scale and the survey was administered to Canadian Army personnel at 4 installations.

Participants
The research sample included 333 personnel from the rank of private to major. All ranks
were represented in the sample along with most of the occupations in the army. Females
comprised 16 % of the sample and 12 % of the sample were officers.

Measures
Expertise. The expertise scale contained 8 items measuring two dimensions. The first
reflects the extent to which respondents possess unique knowledge that provides an important
contribution to society (Item 1: I think that most members of the Army have unique skills and
knowledge that make an important contribution to the Canadian Forces and to society). The
second dimension reflects the extent to which they strive to keep this knowledge up to date (Item
5: I keep up-to-date with new developments in the profession of Arms).

Responsibility. Measured by an 11-item scale, responsibility is conceptualized as having


three dimensions. First, the profession must perform a service to society (Item 10: I always use
my skills and knowledge in the best interest of Canadians). Second, individual members of the
profession have the obligation to adhere to professional standards in their daily work (Item 15: I
would comply with unethical assignments or rules if I were ordered to do so (reverse scored)).

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
762

Third, the profession is a "calling" rather than a job (Item 17: People in the military have a real
"sense of calling" for their work).

Corporateness. This 13-item scale focuses on the regulatory practices within the
profession which ensure members' competence and ethical behaviour. There are three
dimensions to the corporateness construct. First, members must be familiar with and understand
the standards of competence and ethical conduct (Item 21: I am aware of the criteria that define
competence in the profession of arms). Second, a peer-regulatory system of competence and
conduct monitoring must be in place and must be effective (Item 26: It is my duty to take action
when I observe another unit member commit unprofessional actions). Lastly, members must be
given the autonomy to exercise their professional judgment (Item 30: I don't have much
opportunity to exercise my own professional judgment (reverse scored)).

National pride. Measured with a 3-item scale, national pride reflects the extent to which
military professionals are proud of their nation (Item 34: I am proud of Canadian society and
Canadian culture) and proud to be serving their nation (Item 35: I am proud to be a member of
Canada’s military).

Risk acceptance. Risk acceptance was measured by 8 items such as: Item 36: I am
prepared to put my life at risk to defend Canadian territory; Item 40: I am prepared to put my
life at risk in peace support operations (e.g., peacekeeping, peace making).

Outcome measures. We measured OCB (i.e., extra-role behaviours) with items adapted
from Van Dyne, Graham, and Dienesch (1994) and Podssakoff, MacKenzie, Moorman, and
Fetter (1990). We developed 2-item measures of satisfaction with the Army, the unit, and the
occupation along the lines of the satisfaction measure employed by Cotton (1979). We measured
commitment with a 6-item measure of Meyer’s and Allen’s (1991) affective commitment, the
extent to individuals identify with their organization because of emotional attachment to their
organization. All outcome items and scales were measured on a 5-point rating scale.

Results and Discussion

Overview
Our analyses focused on two research questions: (a) To what extent are our rationally
derived scales supported by psychometric analyses (i.e., scale internal consistency indices and
principal components analyses)? (b) To what extent are the dimensions of professionalism
related to important attitudinal outcomes?

Reliability of Professionalism Scales


We calculated Cronbach Alpha coefficients for each of our professionalism scales and
sub-scales. As shown in Table 1, some of the coefficients are low, indicating that the dimension
is multidimensional or requires additional items. The responsibility scale seems to be most
problematic in this regard. Responsibility has a relatively low cronbach alpha for a scale of 11
items (.28), but it also has three subordinate dimensions suggesting that it is properly classified

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
763

as a multidimensional construct.

Table 1

Internal Consistency of Professionalism Scales

Scale Sub-scale Items Cronbach Alpha


Expertise 1–8 (8 items) .61
Unique knowledge 1, 2 .67
Maintain knowledge 3, 4, 5, 6, 7, 8 .56
Responsibility 9 – 19 (11 items) .28
Service to society 9, 10, 11, 12, 13 .25
Adhere to professional standards 14, 15, 16 .22
Sense of calling 17, 18, 19 .57
Corporateness 20 – 32 (13 items) .67
Understand standards of conduct 20, 21, 24, 29 .59
System of monitoring conduct 22, 23, 25, 26, 27, 28 .66
Autonomy 30, 31, 32 .47
National Pride 33 – 35 (3 items) .56
Risk Acceptance 36 – 43 (8 Items) .87
Professionalism 1 – 43 (43 items) .78

Structure of the Professionalism Measure


We conducted principal components analyses (PCA), with varimax rotation, on the 43
professionalism items and found that national pride and risk acceptance were the only
dimensions in which all items clustered as expected. We determined that a 3-component
solution, accounting for 28.7% of the variance, was the most interpretable solution. As the
results in Table 2 show, expertise and responsibility items loaded on three different components.
Most of the expertise items relating to maintenance of professional knowledge clustered on
Component 2. Other expertise items loaded on all components indicating that we should review
our sub-dimensions of expertise and develop additional items for each of these sub-dimensions.
Responsibility items 17, 18 and 19, all relating to professional commitment (i.e., a sense of
calling), clustered on Component 2 and responsibility items 14 and 15 relating to professional
standards of ethical behaviour clustered on Component 3. Responsibility items 9 and 10, relating
to public service, loaded on several components suggesting that we should re-examine this
element of responsibility and possibly develop several additional items for this measure. In
addition, the sub-scales of professional standards and sense of calling sub-dimensions might also
benefit from several additional items. Corporateness items loaded on Components 2 and 3.
Except for Items 20 and 21, items relating to competence and ethical behaviour loaded on
Component 3, along with the above-mentioned responsibility “ethics” Items 14 and 15. Items 30
and 31 from the autonomy sub-dimension of corporateness clustered on Component 2, but the
sole remaining item of this sub-dimension, Item 32, did not.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
764

Table 2
Item Construct Comp 1 Comp 2 Comp 3
Professionalism Scales 3 - Component Solution 21 Corp .347 .313
22 Corp .599
Item Construct Comp 1 Comp 2 Comp 3 23 Corp .599
1 Exp .496 24 Corp .481
2 Exp .341 25 Corp .524
3 Exp .398 26 Corp .566
4 Exp .306 27 Corp .359
5 Exp .367 28 Corp .507
6 Exp 29 Corp
7 Exp .326 30 Corp .371
8 Exp .437 31 Corp .486
9 Resp .329 .364 .344 32 Corp
10 Resp .345 .365 33 Corp .586
11 Resp 34 Pride .489
12 Resp 35 Pride .302 .405
13 Resp 36 Risk Acc .646
14 Resp .307 37 Risk Acc .651
15 Resp .329 38 Risk Acc .776
16 Resp 39 Risk Acc .730
17 Resp .622 40 Risk Acc .748
18 Resp .699 41 Risk Acc .729
19 Resp .371 42 Risk Acc .723
20 Corp .429 43 Risk Acc .710

Note. Exp = expertise, Resp = responsibility, Corp = corporateness, Pride = national pride, and
Risk Acc = risk acceptance.

Overall, the results depicted in Tables 1 and 2 show that risk acceptance has the strongest
psychometric properties of all our professionalism scales and responsibility the weakest. The
conceptual definitions and items representing each of expertise, responsibility and corporateness
need to be reviewed and likely require additional items and further psychometric evaluation.

Professionalism-Outcome Relations
As shown in Table 3, we found many positive (statistically significant) correlations
between our professional scales and the other attitudinal outcome measures. For example,
professionalism (operationalized as the sum of expertise, responsibility, corporateness, national
pride, and risk acceptance) correlated .59 with OCB, .37 with overall satisfaction and .37 with
commitment. Risk acceptance had lower, albeit positive and statistically significant correlations
with the attitudinal outcome measures. The correlations between national pride and the outcome
measures were stronger.

Conclusion

This paper presents the early stages of our work on measuring professional military
attitudes. Some of our scales have sound psychometric properties and all scales correlate with

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
765

Table 3

Intercorrelations Among Professionalism and Outcome Measures

1 2 3 4 5 6 7 8 9 10 11 12
1 Professionalism
2 Expertise .70
3 Responsibility .71 .37
4 Corporateness .78 .47 .45
5 Risk Acceptance .20 .28 .16 ns
6 National Pride .75 .28 .32 .48 ns
7 OCB .59 .59 .36 .53 .27 .33
8 Satisfaction .37 .28 .27 .29 .11 .28 .42
9 Sat Army .32 .30 .22 .22 .31 .21 .46 .67
10 Sat Unit .23 .15 .23 .15 ns .18 .23 .80 .28
11 Sat MOC .32 .24 .18 .31 ns .28 .33 .85 .38 .53
12 Commitment .33 .31 .19 .32 .26 .19 .51 .44 .58 .21 .29
Mean 3.52 3.61 3.40 3.52 4.01 3.57 3.69 3.48 3.73 3.33 3.37 3.41
Standard Dev. .35 .45 .44 .41 .71 .59 .40 .83 .92 1.11 1.15 .73
Note. ns = not statistically significant. All coefficients are significant at p < .001.

meaningful attitudinal outcome measures. Future research should focus on expanding the
professionalism model to include other variables (such as Hall’s [1968] use of professional
organization as a major referent) and improving the psychometric quality of existing scales with
additional items.
References
a.
Canada’s Army. Canadian Forces Publication B-GL-300-000/FP-000. 1 April 1998.
Cotton, C.A. (1979). Military attitudes and values of the army in Canada (Report 79-5).
Willowdale, Ontario: Canadian Forces Personnel Applied Research Unit.
Duty with Honour. Canadian Forces Publication A-PA-005-000/AP-001. 2003.
Huntington, S.P. (1957). Officership as a profession. In Huntington, S.P. The soldier
and the state. Cambridge MA: The Belnap Press of Harvard University Press.
Hall, R.H. (1968). Professionalization and bureaucratization. American Sociological
Review, 33, 92-104.
Meyer, J.P., & Allen, N.J. (1991). A three-component conceptualization of
organizational commitment. Human Resource management Review, 1, 61-98.
Podsakoff, P. M., MacKenzie, S. B., Moorman, R. H., & Fetter, R. (1990).
Transformational leader behaviours, and their effects on followers' trust in leader, satisfaction,
and organizational citizenship behaviours. Leadership Quarterly, 1, 107-142.
Snizek, W.E. (1972). Hall’s professionalism scale: An empirical reassessment.
American Sociological Review, 37, 109-114.
Van Dyne, L. Graham, J.W., & Dienesch, R.M. (1994). Organisational citizenship
behaviour: Construct redefinition, measurement and validation. Academy of Management
Journal, 37, 765-802.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
766

DEOCS: A New and Improved MEOCS


Stephen A. Truhon
Department of Social Sciences
Winston-Salem State University
Winston-Salem, NC 27110
truhons@wssu.edu

ABSTRACT
The Defense Equal Opportunity Management Institute (DEOMI) has studied
equal opportunity through the use the Military Equal Opportunity Climate Survey
(MEOCS) for more than a decade. In the process of updating the MEOCS a new version
called the DEOCS (DEOMI Equal Opportunity Climate Survey) has been developed,
which uses items from the MEOCS-EEO (Equal Employment Opportunity version) that
have been neutralized (i.e., direct references to a majority and minority race and male and
female gender have been removed). A three-step process was used to compare the
DEOCS items with their counterparts in the MEOCS-EEO. Item response theory (IRT)
using the MULTILOG program was performed to calculate difficulty and discrimination
parameters. These parameters were matched to a common scale through the use of the
EQUATE program. Differential item functioning (DIF) was then performed through the
use of the DFIT and SIBTEST programs to discover any item bias. Results showed that
few items displayed item bias (i.e., DIF), usually when those items were extensively
reworded for the DEOCS. In most cases those items displaying DIF the version in the
DEOCS had superior psychometric properties compared to their versions in the MEOCS-
EEO.

INTRODUCTION
A major research project for the Defense Equal Opportunity Management
Institute (DEOMI) has been the development and testing of the Military Equal
Opportunity Climate Survey (MEOCS; Landis, Dansby, & Faley, 1993). This project
includes revising the MEOCS and keeping it up to date.
Suggested revisions to the MEOCS have included shortening it and making its
items more neutral (i.e., replacing references to “majority,” “minority,” “men,” and
“women” with more general terms “race” and “gender” and then using demographic
information to determine the respondent’s specific race and gender). Various methods for
shortening the MEOCS have been examined including confirmatory factor analysis
(McIntyre, 1999), cluster analysis (Truhon, 1999), and item response theory (IRT;
Truhon, 2000, 2002).
Fifty-one items from the MEOCS-EEO have been rewritten to be more neutral in
the DEOCS. The purpose of the current study was to compare the revised items with their
original versions through the use of IRT and DIF.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
767

METHOD

PARTICIPANTS
The DEOCS had been administered to 522 participants at the time of the current
study. A random sample of 522 respondents to the MEOCS-EEO was selected for
comparison.

MATERIALS
Items had been taken for 14 scales from the MEOCS-EEO and revised for the
DEOCS: Sexual Harassment and Discrimination, Differential Command Behavior toward
Minorities and Women, Positive Equal Opportunity (EO) Behavior, Racist Behavior,
Religious Discrimination, Disability Discrimination, Age Discrimination, Commitment,
Trust in the Organization, Effectiveness, Work Group Cohesion, Leadership Cohesion,
Satisfaction, and General EO Climate. Two to five items from each scale were chosen
which previously research had shown to have good psychometric qualities (i.e., item-total
correlations, reliability, and discriminability).

PROCEDURE
A three-step process was followed in these analyses. First, Thissen’s (1991, 2003)
MULTILOG program was used below to obtain difficulty and discriminability
parameters (a and b’s) for the MEOCS-EEO and the DEOCS. Because these parameters
for the two versions were calculated separately, a common metric was needed. Then,
Baker’s (1995) EQUATE program was used to link the two versions. For each of the
scales presented below the parameters from the revised form of the MEOCS were
equated to those of the MEOCS-EEO. The transformation constants (A and K) are also
presented. Finally, following the transformation, DIF analyses were performed using
Raju, van der Linden, and Fleer’s (1995) DFIT program adapted for polytomous items
(Flowers, Oshima, & Raju, 1999; Raju, 2001) and Shealy and Stout’s (1993) SIBTEST
program adapted for polytomous items (Chang, Mazzeo, & Roussos, 1994).

RESULTS
A summary of the results can be seen in Table 1. Listed are the scales, number of
items in the scale, whether any items were reworded for the DEOCS, whether DIF was
detected by DFIT or SIBTEST. An examination of each of the scales is discussed below.
SIBTEST appears to be more sensitive to possible DIF than DFIT. Examination
of BRFs in this report suggests that the SIBTEST is overly sensitive. This also appears to
be true in previous research (Truhon, 2002). Chang et al. (1996) have reported that their
polytomous adaptation of SIBTEST is more likely to exhibit Type I error when there is
nonuniform DIF, which occurs with many items. This would help to explain the seeming
contradiction with Bolt’s (2002) finding that the SIBTEST had less power than DFIT.
Whether one uses the stricter criteria for DIF in DFIT or the looser criteria in
SIBTEST, it is noteworthy that there is a greater DIF in the reworded items than in the
items whose wording was left unchanged. These reworded items allow for a different
interpretation compared to the original version of the items by respondents. For example,
in the reworded version sexual harassment can involve women harassing men and racist
behavior can involve nonwhites discriminating against whites. Overall this suggests that

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
768

the DEOCS has kept the essential qualities of the MEOCS-EEO and many cases
improved upon them.

TABLE 1
Summary of Differential Item Functioning Results for 14 Scales

Scale Number Reworded DIF (DFIT) DIF (SIBTEST)


of Items
Sexual Harassment 4 All None DEOCS 13
and Discrimination DEOCS 14
DEOCS 15
Differential 4 All DEOCS 4 DEOCS 4
Command Behavior DEOCS 6
toward Minorities
and Women
Positive EO 4 All None DEOCS 2
Behavior DEOCS 8
Racist Behavior 3 DEOCS 1 DEOCS 12 DEOCS 1
DEOCS 11
DEOCS 12
Religious 3 None None DEOCS 16
Discrimination
Age Discrimination 3 None None None
Disability 3 None None None
Discrimination
Commitment 5 None None DEOCS 25
DEOCS 28
Trust in the 3 None None None
Organization
Perceived Work 4 DEOCS 35 None DEOCS 34
Group Effectiveness
Work Group 4 None None DEOCS 38
Cohesion DEOCS 39
Leadership Cohesion 4 None None None
Job Satisfaction 5 None None DEOCS 48
Overall EO Climate 2 None Not Determined Not Determined

DISCUSSION
While the use of equating was used in the current study to complete DIF analyses,
equating can be done for tests as a whole. However most of the work has been limited to
dichotomous tests (Kolen & Brennan, 1995). These procedures should be applied to
polytomous tests like the MEOCS. Equating tests would allow researchers to make use of
the large database collected on earlier versions while developing newer versions such as
seen in the DEOCS.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
769

There is a link between test construction and test equating. This is illustrated in a
statement by Mislevy (1992),
Test construction and equating are inseparable. When they are applied in concert,
equated scores from parallel test forms provide virtually exchangeable evidence about
students’ behavior on the same general domain of tasks, under the same specified
standardized conditions. When equating works, it is because of the way the tests are
constructed…(italics in original, p. 37; from Kolen & Brennan, 1995, p. 246).
Dodd, De Ayala, & Koch (1995) have described a procedure to use CRFs to
calculate item information functions (IIFs). These IIFs can be added together to produce
test information functions (TIFs). Wu (2000) developed a model management system to
develop comparable tests making use of TIFs. These equating procedures can then be
used to develop comparable forms of the DEOCS. In this way IRT can be used to
develop alternate forms of the MEOCS and DEOCS (see Cortina, 2001) rather than
compare forms after the fact, as was done in this study.
The kinds of analyses described above set future directions for the DEOCS. They
are frequently used in computerized adaptive testing (CAT). Most CAT has been done in
the area of ability testing. However, CAT has been successfully applied to attitude testing
(Koch & Dodd, 1990). A CAT-version of the DEOCS could establish a person’s response
level on the different scales in the DEOCS with a minimal number of items.
Finally, the DEOCS has begun to go on-line. DIF analyses should be used to
compare on-line responses to paper-and-pencil responses to the DEOCS. While previous
DIF research suggests that administration format does not make a difference for attitude
scales (Donovan, Drasgow, & Probst, 2000) and evaluations (Penny, 2003), it would
useful to verify this.

REFERENCES

Baker, F. B. (1995). EQUATE 2.1: A computer program for equating two metrics in item
response theory [Computer program]. Madison: University of Wisconsin,
Laboratory of Experimental Design.

Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric


polytomous DIF detection methods. Applied Measurement in Education, 15, 113-
141.

Chang, H., Mazzeo, J., & Roussos, L. (1996). Detecting DIF for polytomously scored
items: An adaptation of the SIBTEST procedure. Journal of Educational
Measurement, 33, 333-353.

Cortina, L. M. (2001). Assessing sexual harassment among Latinas: Development of an


instrument. Cultural Diversity and Ethnic Minority Psychology, 7, 164-181.

Dodd, B. G., De Ayala, R. J., & Koch, W. R. (1995). Computerized adaptive testing with
polytomous items. Applied Psychological Measurement, 19, 5-22.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
770

Donovan, M. A., Drasgow, F. & Probst, T. M. (2000). Does computerizing paper-and-


pencil job attitude scales make a difference? New IRT analyses offer insight.
Journal of Applied Psychology, 85, 305-313.

Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of
the polytomous-DFIT framework. Applied Psychological Measurement, 23, 309-
326.

Koch, W. R. & Dodd, B. G. (1990). Computerized adaptive measurements of attitudes.


Measurement and Evaluation in Counseling and Development, 23, 20-30.

Kolen, M. J., & Brennan, R. L. (1995). Test equating: Methods and practices. New York:
Springer.

Landis, D., Dansby, M. R., & Faley, R. H. (1993). The Military Equal Opportunity
Climate Survey: An example of surveying in organizations. In P. Rosenfeld, J. E.
Edwards, & M. D. Thomas (Eds.), Improving organizational surveys: New
directions, methods, and applications (pp. 122-142). Newbury Park, CA: Sage.

McIntyre, R. M. (1999). A confirmatory factor analysis of the Military Equal Opportunity


Climate Survey, Version 2.3. (DEOMI Research Series Pamphlet 99-5). Patrick
AFB, FL: Defense Equal Opportunity Management Institute.

Mislevy, R. J. (1992). Linking educational assessments: Concepts, issues, methods, and


prospects. Princeton, NJ: ETS Policy Information Center.

Penny, J. A. (2003). Exploring differential item functioning in a 360-degree assessment:


Rater source and method of delivery. Organizational-Research-Methods, 6, 61-
79.

Raju, N. S. (2001). DFITPS6: A Fortran program for calculating polytomous DIF/DTF


[Computer program]. Chicago: Illinois Institute of Technology.

Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of
differential functioning of items and tests. Applied Psychological Measurement,
19, 353-368.

Shealy, R., & Stout, W. F. (1993). A model-based standardization approach that separates
true bias/DIF from group ability differences and detects test bias/DTF as well as
item bias/DIF. Psychometrika, 58, 159-164.

Thissen, D. (1991). MULTILOG User’s Guide (Version 6.0). Lincolnwood, IL: Scientific
Software.

Thissen, D. (2003). MULTILOG 7.03 for Windows. [Computer program]. Lincolnwood,


IL: Scientific Software International.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
771

Truhon, S. A. (1999). Updating the MEOCS using cluster analysis and reliability.
(DEOMI Research Series Pamphlet 99-8). Patrick AFB, FL: Defense Equal
Opportunity Management Institute.

Truhon, S. A. (2000). Shortening the MEOCS using item response theory. (DEOMI
Research Series Pamphlet 00-8). Patrick AFB, FL: Defense Equal Opportunity
Management Institute.

Truhon, S. A. (2002). Comparing two versions of the MEOCS using differential item
functioning. (DEOMI Research Series Pamphlet 02-7). Patrick AFB, FL: Defense
Equal Opportunity Management Institute.

Wu, I-L. (2000). Model management system for IRT-based test construction decision
support system. Decision Support Systems, 27, 443-458.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
772

ADDRESSING PSYCHOLOGICAL STATE AND WORKPLACE BEHAVIORS

OF DOWNSIZING SURVIVORS

Rita Shaw Rone


Naval Education and Training Command Headquarters
Pensacola, Florida, USA
rsrone1@mchsi.com

According to Burke and Nelson (1998), 10.8 million people lost their jobs between 1981
and 1988 in the United States. In the Department of Defense alone, thousands of civilian
employees left federal service during the 1990's due to base closures and realignment of
functions. During this period, a great deal of research has been focused on study of displaced
workers. Certainly their circumstance renders them deserving of attention. As important,
however, are those employees left behind, i.e., the organizational "survivors." If organizational
leaders expect the dramatic changes they have made to organizational structures to yield the kind
of efficiencies they hope for, they must assure that they have adequately assessed the impact of
downsizing on these survivors and developed strategies for addressing it.

Organizational Reality

The organizational changes spawning recent downsizing have been characterized as


revolutionary in nature. Hamel (2000) maintains that organizations are on the threshold of a new
age and that such a "revolution" (p. 4) is bringing with it an anxiety clearly felt by those left
behind. Gowing, Kraft, & Quick (1998) assert that it is a "dramatic, systemic revolution"
similar to America's agricultural revolution which began in the late 18th century and ended in the
early 20th century. These new realities are reflective of the new employment contract, also
known as the psychological contract. This is comprised of the "individual beliefs, shaped by the
organization, regarding terms of an exchange between individuals and their organizations”
(Rousseau, 1995, p. 9). Bunker (1997) indicates that the old contract, "founded on the exchange
of hard work and loyalty for lifetime employment, has been repeatedly violated and is probably
permanently undermined" (p. 122).

Impact on Survivors

In a comprehensive meta-analysis of the effects of downsizing on survivors, West (2000)


posits that there is a harmful impact on survivors. This includes impact to the survivor’s
psychological state as well as the ensuing effects of that state on the his or her behavior on the
job. Layoffs, according to West, have the clear potential "to affect survivors' psychological
states, which, in turn, have the potential to influence a variety of work behaviors and attitudes"
(p. 8). The term "layoff survivor sickness," coined by Noer (1998), is used to describe feelings,

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
773

attitudes and behaviors observed and found in research on this subject. Noer indicates that
certain feelings are typical of the survivor. These include (a) Fear, insecurity, and uncertainty,
(b) frustration, resentment, and anger, (c) sadness, depression, and guilt, and (d) unfairness,
betrayal, and distrust.

Another feeling often expressed by survivors is a perceived loss of control. According to


Spreitzer & Mishra (1997), survivors will seem helpless and alienated and may feel somewhat
disconnected with current management officials, seeing themselves as more closely identifying
with co-workers and friends who have left the organization. Another concern some employees
face is the loss of identity. If, for example, one's status in the organization changes, this could
trigger extreme feelings of loss, as some employees feel defined by the work they do or the
position they hold (Stearns, 1995).

According to Noer (1998), in a downsized environment, survivors often cope via non-
productive behaviors on the job. These behaviors include: (a) Aversion to risk and a more rigid
approach to work; (b) a drop in productivity, due to stress, with a longer term effect continuing
due to a loss of "work spirit" (p. 213); (c) a demand for information; (d) a tendency to blame
others; and (e) denial that the change causing the downsizing is real. Evidence of these
behaviors is echoed in other research in this area. Parker, Chmiel & Wall (1997) cite several
studies that support the notion that survivors become less committed, experience greater strain,
enjoy their jobs less, and are absent more often.

Challenges to Organizational Leaders

It is not surprising to discover high levels of job dissatisfaction and insecurity among
organizational survivors. According to a 1995 survey of 4,300 American workers, only 57
percent of those in downsizing companies indicated that they were satisfied with their jobs
(Woodruff, 1995). Also, a negative correlation has been found to exist between downsizing and
morale and productivity variables (Duron, 1994). There is good news, however. Research
indicates that specific actions of organizational leaders can make a difference. Wagner (1998)
asserts "management methods in implementing downsizing may tend to positively or negatively
affect the impact of downsizing on productivity, morale, and organizational perception" (p. 3).

First of all, managers can involve employees in the decision-making process and attempt
to make the work as interesting as possible, actions that can help combat feelings of helplessness
and apathy. Secondly, managers should communicate expectations to employees. As simplistic
as it may sound, making sure employees know what to do and how to do it is imperative in
creating a feeling of competence. A third recommendation from Woodruff (1995) is that
managers reduce as many administrative irritants as possible. Anxious and mistrustful, survivors
may sometimes perceive seemingly unimportant situations as significant and threatening.
Managers can also encourage teamwork. The new, changed landscape of the workplace may
have broken up old alliances, as valued friends retired or were separated from employment.
Working groups, based on new alliances, may serve as catalysts to a healing process.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
774

Organizational management also must treat the survivors as individuals. One on one interaction
between supervisory personnel and employees is essential to employees' feeling valued.

Parker, Chmiel & Wall (1997), in a longitudinal study of strategic downsizing, found
that specific actions pertaining to improvement in work characteristics could have a long term
positive effect on survivors, even when there is an increase in work demands. This apparently
results from increases in control, clarity, and participation, all states that are associated with
improved well being. Managers should explore the kinds of initiatives that might yield such
improvements and implement them.

The importance of and need for better communication is a recurring theme in downsizing
literature; this is a critical responsibility of management. In such times, the need for information
becomes an almost frantic quest. There is often a tendency for organizational leadership to “hold
back” information during times of reorganization out of concern that telling employees too much
will somehow be detrimental. Such action simply creates more concern and fear. Managers
should also solicit feedback, which should be given legitimate consideration and acted upon
(Kotter, 1996). When organizational leaders do this, they foster a sense of well being in the
survivors and, as a bonus, they receive information that can be useful in the change efforts.

A less obvious, but still important, step managers can take is leading by example. Kotter
(1996) indicates that there is nothing that undermines the communication of a change vision so
thoroughly as key managers’ behavior inconsistent with the new organizational vision. They
must act responsibly, communicate both up and down, and foster effective teamwork, behaviors
for their workers to emulate. They must work toward a "positive emotional perspective toward
the work," endeavoring to keep people optimistic (Hiam, 2002). In doing so, they can enhance
many things – problem solving, listening, and conflict resolution. Bunker (1997) also indicates
that they must assess and accept their own emotional responses and vulnerability and model
healthy coping behaviors – actions that can enhance the organization’s and the survivor’s
recovery.

A key problem for organizational leadership in times of downsizing pertains to employee


motivation. There are actions that can be taken to move survivors toward a more positive
motivational climate (Woodruff, 1995). One key focus should be intrinsic motivation, i.e.,
motivation that, according to Richter (2001), occurs because an employee is passionate about his
or her task. Such motivation is brought about through fostering a climate in which the
employees' intrinsic motivation is enabled. Richter indicates that such an environment is one in
which employees feel competent, in control, and emotionally tied to others, all feelings that seem
in direct conflict with what survivors typically experience. Similar to intrinsic motivation is
"internal commitment," as discussed by Carter (1999, p. 105). This is a behavior evident when
employees are committed to a particular project, person, or program based on their own reasons.
Again, their motivation comes from within. According to Carter, this type of commitment is
closely tied to empowerment. To realize it, managers must try to involve employees in defining
work objectives and planning how to achieve them. Intrinsic motivation also has a strong
connection to empowerment. Spreitzer & Mishra (1997) indicate that empowerment is an

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
775

effective way to re-build survivors' trust through the displaying of their own trust in the
survivors. Such empowerment is extremely important, not only for this reason, but for others.
First, it is a prerequisite to risk-taking, a behavior typically abandoned by survivors and,
secondly, it "reflects a proactive orientation to one's work" as well as "a sense of meaning,
competence, self-determination, and impact" (Spreitzer & Mishra, 1997, p. 6). Managers must
commit themselves to other behaviors associated with intrinsic motivation. They must attempt
to provide employees a sense of choice, competence, meaningfulness, and progress (Richter,
2001). This could possibly have the potential of generating a lasting effect, as it appears to
relate well to the aspects of control, clarity, and participation cited in the longitudinal studies of
Parker, Chmiel & Wall (1997). According to the theory of self-determination, from Deci &
Ryan (1985), individuals have a universal need to be autonomous and competent. Any actions
managers can successfully accomplish to move employees toward these states should foster
individual intrinsic motivation and, eventually, should impact the goals of the organization.

A motivation focus should not totally exclude considerations of external motivation.


External motivation is typically viewed as that occurring when an employee performs a task in
relation to an external force, either positive, e.g., expectation of a bonus, or negative, e.g., fear of
censure. During the aftermath of a downsizing effort, managers can and should utilize reward
systems in recognizing outstanding performance, when appropriate. They should also provide
positive reinforcement in less formal ways. Such recognition, however small, may sometimes be
the "lift" the survivor needs to feel worthy and competent again (Woodruff, 1995).

Conclusion

Until organizational survivors' psychological needs are met, they may operate within
their organizations in states of ill health, disarray, and helplessness. Organizational leaders and
managers must assess and attend to the impact of downsizing on these survivors. Managers must
face the fact that survivors of organizational downsizing typically experience specific and
numerous psychological changes affecting their feelings and, thusly, their behavior in the
workplace. Research describes the general patterns of behavior common to a downsizing
environment, patterns that can be addressed through management action. Such actions, e.g.,
empowerment, the re-building of trust, leading by example, and continuous, truthful two way
communication, can often mitigate the negative psychological damage of the downsizing
experience, helping the survivors to move back into a healthy, productive state.

Managers will find challenges relating to the motivation of these employees quite
daunting, primarily because a post-downsizing environment typically produces thought patterns
opposite of those existing in an environment fostering intrinsic motivation. Nevertheless,
managers must rise to this challenge and do the difficult work of sharing power, learning what
motivates individual employees and acting on it, and improved listening. They must also
embrace the concept of positive reinforcement and use it appropriately. Often the primary focus
of downsizing is the gaining of efficiencies through cost reduction. Those in charge may see
employees who have survived termination following downsizing as the fortunate ones. As such,
they may feel that initiatives addressing the way these particular employees "feel" should be

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
776

relegated to the background. In such instances, an approach more connected to organizational


goals may be necessary. They may be more receptive to suggestions for initiatives aimed at
alleviating survivors' pain and discontent if they perceive them as having an authentic connection
with the motivation, performance, and success of these survivors in helping the organization
realize its future goals. Continued research in this important area and the communication of the
results of that research to organizational leaders is the right action to insure that survivors do not
become the forgotten victims of downsizing.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
777

Reference List

Bunker, K.A. (1997). The power of vulnerability in contemporary leadership.


Consulting Psychology Journal, 49, 122-136.
Burke, R.J. & Nelson, D. (1998). Mergers and acquisitions, downsizing, and
privatization: A North American perspective. In M.K. Gowing, J. D. Kraft &
J.C. Quick (Eds.). The new organizational reality: Downsizing, restructuring, and
revitalization. Washington, D.C.: American Psychological Association.
Carter, T. (1999). The aftermath of reengineering: Downsizing and corporate
performance. New York: The Haworth Press.
Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in
human behavior. New York: Plenum.
Duron, S.A. (1994). The reality of downsizing: What are the productivity outcomes?
(Doctoral dissertation, Golden State University, 1994). Dissertation Abstracts
International, 54, 4953.
Gowing, M.K., Kraft, J.D., & Quick, J.C. (1998). The new organizational reality.
Washington, D.C.: American Psychological Association.
Hamel, G. (2000). Leading the revolution. Boston: Harvard Business School Press.
Hiam, A. (1999). Motivating & rewarding employees: New and better ways to inspire
your people. Holbrook, MA: Adams Media Corporation.
Hiam, A. (2002, October). Motivation: Recognition tips. Transaction World, II.
Retrieved February 25, 2003 from
http://www.transactionworld.com/articles/2002/october/motivation1.asp
Heckscher, C. (1995). White-collar blues: Management loyalties in an age of corporate
restructuring. New York: Basic Books.
Kotter, J. P. (1996). Leading change. Boston: Harvard Business School Press.
Noer, D. (1998). Layoff survivor sickness: What it is and what to do about it. In M.K.
Gowing, J. D. Kraft & J.C. Quick (Eds.). The new organizational reality:
Downsizing, restructuring, and revitalization. Washington, D.C.: American
Psychological Association.
Noer, D. (1999). Helping organizations change: Coping with downsizing, mergers,
reengineering, and reorganizations. In A.I. Kraut & A. K. Korman (Eds.).
Evolving practices in human resource management. San Francisco: Jossey-
Bass Publishers.
Parker, S. K., Chmiel, N. & Wall, T.D. (1997). Work characteristics and employee well-
being within a context of strategic downsizing. Journal of Occupational Health
Psychology, 2, 289 – 303.
Peters, T. (1987). Thriving on chaos. New York: Harper & Row.
Richter, M.S. (2001). Creating intrinsically motivating environments: A motivation
system. StoryNet. Retrieved February 25, 2003, from
http://www.thestorynet.com/articles_essays/motivation.htm .
Rousseau, D.M. (1995). Psychological contracts in organizations: Understanding
written and unwritten agreements. Thousand Oaks, CA: Sage Publications.
Schweiger, D.M., Ivancevich, J.M. & Power, F.R. (1987). Executive actions for

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
778

managing human resources before and after acquisition. Academy of


Management Executive, 1, 127 - 138.
Stearns, A.K. (1995). Living through job loss. New York: Simon & Schuster.
Wagner, J. (1998). Downsizing effects on organizational development capabilities at an
electric utility. Journal of Industrial Technology, 15, 2 – 7. Retrieved February
20, 2003, from http://www.nait.org/jit/Articles/wagn1198.pdf
West, G. B. (2000). The effects of downsizing on survivors: A meta-analysis (Doctoral
dissertation, Virginia Polytechnic Institute and State University, 2000). Retrieved
January 25, 2003 from http://scholar.lib.vt.edu/theses/available/etd-04202000-
14520000/unrestricted/new-etd.pdf.
Woodruff, D.W. (1995). Motivation: After the downsizing. Hydrocarbon Processing,
74, 131 – 135.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
779

VALIDATION OF THE BELGIAN MILITARY PILOT SELECTION


TEST BATTERY
Yves A. Devriendt, psychologist, and Cathérine A. Levaux, psychologist
Belgian Department of Defence
General Directorate Human Resources
Kwartier Koningin Astrid
Bruynstraat
1120 Neder-Over-Heembeek, Brussels, Belgium
yves.devriendt@mil.be

PROBLEM DEFINITION
For different reasons pilot selection is very important: the fast flying machines are
difficult to handle, resources - such as gasoline - can be wasted, human incompetencies can
cause serious damage and accidents. During the last decade budgetary restrictions have even
made more important avoiding wasting resources.
In the Belgian Defence Forces most of the pilot tests are computer-based and
following Hilton and Dolgin (1991) the reality factor with regard to the working environment
could be a catalyser for test predictivity.
Nevertheless, more recently, increasing failure rates have led to call in question the
effectiveness of the Belgian military pilot selection system. Psychologist were asked to
conduct validity studies in order to replace bad performing tests. When training wastage is
observed, it is not unusual to ask specialists to conduct validation studies, as Caretta and Ree
(1997) report.
Little is known about validity of the current Military Pilot Selection Battery (MPSB),
due to a lack of interest for applied research matters from the side of policy makers and a
shortage of large data sets. Until now there was no real psychotechnical tradition in the
domain of pilot selection, but due to recent reorganisations and the creation of a General
Directorate of Human Resources including a Research & Technology Department things may
change for the better.
In addition to the psychotechnical aspect there are technical and practical
considerations to replace at least a part of the MPSB. The computer environment is an old
computer setting. Spare-parts are becoming rare and expensive, and the programming
languages will have to be adapted to meet modern standards. Furthermore, the pilot test
battery will have to be dislocated in 2006, due to the policy to centralise all of the selection
activities.
The authors will give an overview of the procedures used to validate the MPSB and
the results obtained. Procedures, results and areas of future research will be discussed.

METHOD
Participants
The participants in the studies were Flemish and Walloon Auxiliary pilots, both male
and female, who participated in the selection procedures during 1996 and 2000. Auxiliary
pilots do not start an academic education like the candidate pilots entering at the Royal
Military Academy, but they receive an adapted theoretical and mainly practical oriented
training. Their contracts are limited in time, but there are possibilities to get a life long

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
780

contract by succeeding in selection procedures later on in their careers. There were data
available on applicants (n = 2044) and pilot trainees (n = 129). Applicants were administered
the MPSB, some of them passed the test, others did not. Pilot trainees were selected by means
of scores on the MPSB and they all passed the test battery.
Measures
Independent variables or predictors
The MPSB was composed of a Psychomotor Battery (PM), a Pilot Skills Battery (PS),
Academic tests, Physical tests, a psychological and a professional interview.
The PM consisted of three tests. First, there was a co-ordination test with two parts:
C_COORD and T_COORD. In the C_COORD, part 1 and part 2, the candidates must keep a
moving ball within the limits of a square. In the T_COORD, part 1 and part 2, a candidate
must either react or not when a triangle appears. Second, there was a discrimination test
producing three scores: G_DISCR (number of good responses, part 1 and part 2), TR_DISCR
(reaction time) and RI_DISCR (number of errors). During this test a candidate must
discriminate between coloured squares and react in an appropriate way. Third, there was the
number of good responses on the RS_506, a test measuring reasoning, spatial orientation and
visual representation.

Test name Kind of test Description

Reaction time test Reaction Number of good responses as an answer


to different stimuli
Reaction time Time needed to respond as an answer to
different stimuli
ABL 304 Visual memory, spatial Number of good responses in
orientation, organisation of memorising a geographic map
the learning process
ABL 17 Reasoning Number of good responses in
completing series of stimuli
ABL 152 Visual memory, Number of good responses in making
concentration, learning associations between stimuli
method
Cubes test Three dimensional Number of good responses in
visualisation and reasoning representing three dimensional figures
Digit recall Short Term Memory Number of good responses in
reproducing series of digits
Arithmetic Numerical reasoning Number of good responses in detecting
principles in series of numbers
Manikin Lateral orientation Number of good responses in localising
positions
Spiral Motor co-ordination Time needed to track a wire without
touching the wire

Figure 1. Composition of the Pilot Skills Test Battery

The PS Battery contained nine subtests, described in Figure 1. It should be made clear
that the ABL 152 was composed of four distinct time periods, with each time the same

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
781

instructions, in order to make observations on the learning progression an applicant shows.


Digit recall consisted of two parts. The academic part of the MPSB was conceived to measure
skills in the domain of Maths, Physics, sciences and languages (mother tongue and English as
a second language). Furthermore, candidates had to pass physical tests (swimming, shuttle
run, pull ups, sit ups, jumping and throwing). The psychological interview was conducted by
a psychologist in order to get an impression of the personality characteristics of the
candidates. Finally, the professional interview was done by a pilot in order to measure the
candidate’s job knowledge and motivation.
Dependent variable or criterion
The criterion for the regressions was the evaluation score at the end of the twelve
flights during the initial training stage, called the General Flight Double (GFD12). During
these flights the instructor flies with the trainee and can correct for trainee’s mistakes. A flight
without mistakes is rewarded with a blue card (= excellent flight) or a green card (=
satisfactory flight). For a weak flight the trainee gets a yellow card and for an unsatisfactory
flight a red card is given.
The criterion was operationalised in a continuous way (the summed score of the
number of green and blue cards, with a minimum of zero and a maximum of twelve) and in a
dichotomous way (passed or failed the training).

Procedures
A Principal Components Analysis (PCA) was conducted on the PM and PS data sets
using the applicant’s population.
Linear and logistic regressions were performed on the academic, physical, interview
and psychotechnical data subsets. Each of the described analyses was performed on the pilot
trainee population. There were violations of the normality assumptions for interview data and
for some distributions of psychotechnical test scores. A multicollinearity problem arose in the
analysis of the PM-test scores. There proved to be redundancy (r = -.97) between RM_DISCR
(number of omitted responses in the discrimination test) and G1_DISCR number of good
responses on the first part of the discrimination test). The variable RM_DISCR was removed
from further analyses.

RESULTS
Principal Components Analysis
In Table 1 the outcome of a PCA for PM and PS results on the applicant’s population
is shown. A good and interpretable solution was found with a Quartimax normalized rotation.
Marked loadings are > .700000. Following five components could be distinguished: memory-
learning method (Component 1), discrimination (Component 2), co-ordination (Component
3), spatial-mathematical reasoning (Component 4) and reaction (Component 5). In the two
last rows the explained variance and the total proportion of explained variance are given.
Although variables G1_DISCR and G2_DISCR, on the one hand, and variables
TR_DISCR and RI_DISCR, on the other hand, belong to the same component they are
negatively correlated. The reason for this is probably that they measure the same construct in
a different way: the former two measure good responses (the more the better), the latter two
are a time measure (the less, the better).
Table 1. Principal Components Analysis on PM and PS test results in the applicant’s
population of auxiliary pilots

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
782

Variable Component Component Component Component Component


1 2 3 4 5

C1_COORD .914249*
T1_COORD .806199*
C1_COORD .919167*
T2_COORD .811132*
G1_DISCR .770113*
G2_DISCR .742764*
TR_DISCR -.539144
RI_DISCR -.681685
RS_506 .580136
Reaction time .425515
(responses)
Reaction time .289643
(time)
ABL 304 .446364
ABL 17 .590505
ABL 152 Series 1 .869106*
ABL 152 Series 2 .943024*
ABL 152 Series 3 .913303*
ABL 152 Series 4 .865682*
Cubes test .661133
Digit recall 1 .435885
Digit recall 2 .583022
Arithmetic .547164
Manikin .324255
Spiral time .161590

Expl. Var 3.443161 2.003158 1.787692 2.791848 1.574403


Prp. Total .1497035 .087094 .077726 .121385 .068452

Regressions
PM-battery and Physical predictors
No statistical significant results were found.
PS-battery
A multiple linear regression was conducted on the pilot trainee population, using the
continuous criterion, GFD12. The forward and backward stepwise procedures resulted in the
same subset of predictors: ABL152 Series 4, Arithmetic, Digit recall 1, Manikin, ABL 304
and ABL 152 Series 2 (R = .41; R² = .17; Adjusted R² = .12 and p < .00318).
Academic tests
GFD12 continuous was regressed on the academic test results (R = .61; R² = .38;
Adjusted R² = .34 and p < .00051). The following subset of predictors gave the best
predictions: Physics and first Language.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
783

Interviews
GFD dichotomous was regressed on the results of both the physiological an
professional interview scores, and on each of these variables separately. For the analysis four
logistic regression methods were used: the Quasi-Newton method, the Simplex procedure, the
Hooke-Jeeves Pattern Move and the Rosenbrock Pattern Search.
There was no fit for the psychological and professional scores together, nor for the
professional interview score apart. A logistic one-variable model containing the psychological
interview score, in contrast, gave a good fit (χ² = 4.34; p = .03715).

CONCLUSIONS AND DISCUSSION


The PCA yielded a five-component structure. The results gave furthermore evidence
for predictive validity with regard to some tests used in the PS-battery and the Academic tests.
The psychological interview score too proved to be promising in a predictive way. All of
these findings will have to be cross-validated.
The GFD12 was chosen because of its statistical and practical qualities. First, criterion
data were available both in a continuous and a dichotomous form. Using the blue and green
cards there was a spread in the values of the variable, ranging from zero to twelve and its
distribution was fluent and approached the bell-curve. Therefore the criterion, operationalised
in this way, could be treated as a continuous variable. A continuous variable offers some
advantages compared to a dichotomised criterion (Hunter & Burke, 1995; Caretta and Ree,
2003). Nevertheless, there were also data available on the criterion in a dichotomised form
(passed or failed in the GFD). These data were very useful in case of violation of assumptions
for linear techniques with regard to the predictors.
Second, the criterion was an initial or intermediate criterion and not an ultimate one.
Dover (1991) remarks that the more distant the criterion is, the lower the validity is. Hilton
and Dolgin are in favour of initial training as a criterion for validity and say that an ultimate
criterion is less cost-effective. Others, like Helmreich (see Hilton & Dolgan, 1991) say job
performance is a more realistic criterion than initial training. No evidence has been found for
predictivity in relation to the PM-battery. One of the reasons could be the censoring of the
population or restriction in range (Caretta & Ree, 2003). Applicants were selected mainly on
the PM-composite score.
There was no direct selection on the basis of the PS-score, the results of the PS-battery
were used only as an indicator by the psychologist during the psychological interview. In the
present study no corrections for range restriction were applied. In order to obtain a better view
on the predictive value, results should be recalculated and corrected for restriction in range.
An important issue in the Belgian context could be the existence of differences
between scores in Flemish and Walloon trainees or applicants, and, of course, the differences
between scores in males and females. Study of gender and ethnic group differences is
important in evaluating measurement instruments. Caretta (1997) explains why.
In the Belgian context there is a shortage of data on female trainees. Research on
differences between Flemish and Walloon trainees is possible, but difficult, because of the
small groups of trainees for both linguistic systems.
Recent evolvements in pilot research indicate that it is important to test for multi-
tasking capability. Second, personal and non-technical skills will become more important
because of the increasing importance of teamwork in small teams in modern military aviation
(Damitz, Manzey, Kleinmann & Severin, 2003; Hanson, Hedge, Logan, Bruskiewicz,
Borman, & Siem, 1996). In the near future a test for multi-tasking will be added to the

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
784

Belgian MPSB and from next year onward candidate auxiliary pilots will be examined in the
assessment centre procedure currently used by the Belgian Department of Defence. Devriendt
(1999) gave an overview of some typical assessment techniques used by the Belgian
Department of Defence.

REFERENCES
Caretta, T.R. (1997). Sex Differences on U.S. Air Force Pilot Selection Tests.
Proceedings of the Ninth International Symposium on Aviation Psychology, Columbus, OH,
1292-1297.
Caretta, T.R. & Ree, M.J. (2003). Pilot Selection Methods. In P. Tsang & M. Vidulich
(Eds.). Principles and Practice of Aviation Psychology (pp. 357-396 ). New Jersey: Lawrence
Erlbaum Associates.
Damitz, M., Manzey, D., Kleinmann, M., Severin, K. (2003). Assessent Center for
Pilot Selection: Construct and Criterion Vlaidity and the Impact of Assessor Type. Applied
Psychology: An International Review, 52, pp 193-212.
Devriendt, Y. (2000). The Officer Selection in the Belgian Armed Forces. Paper
presented at the RTO Human Factors and Medicine Panel (HFM) Workshop held in
Monterey, USA, 9-11 November 1999, and published in the RTO proceedings 55.
Dover, S.H. (1991). Selection Research an Application. In R. Gal & A. Mangelsdorff
(Eds.). Handbook of Military Psychology (pp. 131-148). Chichester: Jon Wiley & Sons.
Hanson, M.A., Hedge, J.W., Logan, K.K, Bruskiewicz, K.T, Borman, W.C, & Siem,
F..M. (1996). Development of a Computerized Pilot Selection Test.
Http://www.ijoa.org/imta96/paper59.html
Hilton, T.F. & Dolgin, D.L. (1991). Pilot Selection in the Military of the Free World.
In R. Gal & A. Mangelsdorff (Eds.). Handbook of Military Psychology (pp. 81-101).
Chichester: Jon Wiley & Sons.
Hunter, D.R. & Burke, E.F. (1995). Handbook of Pilot Selection. Aldershot: Avebury
Aviation Ashgate Publishing Ltd.

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
784 INDEX OF AUTHORS

Acromite, M. 70 Edgar, E. 438


Alderton, D.L. 62, 199, 481 Eller, E.D. 62
Annen, H. 13 Elliott-Mabey, N.L. 6
Arlington, A.T. 587 Erol, T. 49
Arnold, R.D. 129 Fallesen, J. 171, 721
Baker, D.P. 661, 679, 688 Farmer, W.L. 62, 455, 481
Balog, J. 431 Fatolitis, 129
Barrow, D. 431 Ferstl, K.L. 455
Bearden, R,M. 62, 455 Filjak, T. 305
Beaubien, J.M. 271, 679 Fischer, L.F. 289
Bilgic, R. 49 Fitzgerald, L.F. 237
Boerstler, R.E. 499 Ford, K.A. 252
Borman, W.C. 398, 455 Giebenrath, J. 116
Bowles, S. 205, 398 Gorney, E. 358, 599
Boyce, E.M. 561, 567 Gramlich, A. 171
Braddock, L. 581 Greenston, P. 556
Bradley, P. 760 Gutknecht, S.P. 22
Brown, K.J. 710 Hanson, M.A. 485
Brown, M.E. 561, 567 Harris, R.N. 199
Brugger, C. 44 Hawthorne, J. 431
Bruskiewicz, K.T. 485 Hedge, J.W. 455
Burns, J.J. 116 Heffner, T.S. 507, 556
Calderón, R.F. 661 Heil, M. 156
Campbell, R. 507, 556 Helm, W.R. 123
Campbell, S. 760 Hendriks, B. 344
Carriot, J. 91 Hession, P. 116
Caster, C,H. 448 Heuer, R.J.,Jr. 297
Castro, C.A. 177 Hindelang, R.L. 62
Charbonneau, D. 760 Holtzman, A.K. 661, 679, 688
Chen, H. 62 Horey, J.D. 721
Chernyshenko, O.S. 317, 323 Horgen, K.E. 398
Cian, C. 91 Houston, J.S. 455
Cippico, I. 305 Howell, L.M. 158, 208
Collins, M.M. 522 Huffman, A.H. 177
Costar, D.M. 688 Iddekinge, C.H. 491, 531
Cotton, A.J. 358, 599, 702 Irvine, J.H. 96
Cowan, J.D. 103 Janega, J.B. 150, 330
Crawford, K.S. 283, 297 Johnson, R.S. 62
Cronin, B. 156, 654 Jones, P.L. 505
Debač, N. 305 Kamer, B. 13
Devriendt, Y.A. 779 Kammrath, J.L. 499
Douglas, I. 671 Katkowski, D.A. 522
Drasgow, F. 310, 323, 317 Keenan, P.A. 522
Dressel, J.D. 111, 140 Keeney, M.J. 271
Dukalskis, L. 438 Keesling, W. 734
Dursun, S. 608 Keller-Glaze, H. 171

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
INDEX OF AUTHORS 785

Kilcullen, R.N. 531 Phillips, H.L. 129


Klein, R.M. 158, 208 Ployhart, R.E. 631
Klion, R.E. 418 Putka, D.J. 491, 531, 549
Knapp, D.J. 556 Radtke, P. 661, 688
Kolen, J 70 Raphela, C. 91
Kramer, L.A. 297 Reid, J.D. 123
Krol, M. 734 Richardson, J. 167
Krouse, S.L. 96 Rittman, A.L. 140
Kubisiak 398 Rone, R.S. 772
Lancaster, A.R. 208 Rosenfeld, P. 581
Lane, M.E. 561, 567 Russell, T.L. 514, 540
Lappin, B.M. 158 Sabol, M.A. 140
Lawson, A.K. 237 Sager, C.E. 491, 507, 514, 549
Lee, W.C. 310 Sapp, R. 412
Lescreve, F.J. 336 Schaab, B.B. 111, 140
Lett, J. 734 Schantz, L.B. 522
Levaux, C.A. 779 Schneider, R.J. 455
Lim, B.C. 631, 750 Schreurs, B. 381
Lipari, R.N. 158, 208 Schultz, K. 412
Lofaro, R.J. 1 Seilhymer, J. 431
Luster, L. 30 Smith, G.A. 620
Makgati, C.K.M. 694 Smith, J. 654
Maliko-Abraham, H 1 Smith-Jentsch, K.A. 661, 688
Marsh-Ayers, N. 448 Snooks, S 30.
McCloy, R.A. 531 Soh, S. 750
Michael, P.G. 62 Stam, D. 344
Mitchell, D. 171 Stark, S.E. 317, 323
Morath, R. 156, 654 Steinberg, A.G. 573, 575
Morath, R. 654 Stetz, T.A. 271
Moriarty, K.O. 522 Still, D.L. 70
Morrow, R. 608 Strange, J. 505
Mottern, J.A. 199, 561, 567 Styer, J.S. 468
Mylle, J. 404, 592 Sumer, H.C. 49
Nayak, A. 62 Sumer, N. 49
Nederhof, F.V.F. 336 Tan, C. 750
Newell, C.E. 581 Temme, L.A. 70
Nourizadeh, S. 573, 575 Thain, J. 734
O’Connell, B.J. 271, 448 Thompson, B.R. 615
O'Keefe, D. 461 Tišlarić, G. 305
Olmsted, M.G. 150, 330 Tremble, T. 507
Ormerod, A.J. 219 Truhon, S.A. 766
Oropeza, T. 431 Twomey, A. 461
Osburn, H. 505 van de Ven, C. 344
Paullin, C.J. 485 Van Iddekinge, C.H. 531
Peck, J.F. 278 Waldköetter, R. 587
Pfenninger, D.T. 418 Watson, S.E. 62, 474

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003
786 INDEX OF AUTHORS

Waugh, G.W. 540


Wenzel, M.U. 418
Weston, K. 438
White, L.A. 398, 485
White, M.A. 199, 561, 567
Whittam, K. 62
Willers, L. 412
Willis, D. 742
Wiskoff, M.F. 300
Wood, S. 283
Wright, C.V. 219
Youngcourt, S.S. 177
Zarola, A. 438
Zebec, K. 305

45th Annual Conference of the International Military Testing Association


Pensacola, Florida, 3-6 November 2003

S-ar putea să vă placă și