Documente Academic
Documente Profesional
Documente Cultură
April 2007
A thesis submitted for as fulfilment of the requirements for the degree of Doctor of
Declaration
At various stages during this PhD, I was involved in collaborative efforts with both
academic and industrial colleagues. In certain cases, the outputs of these collaborations
are included in this thesis to better explain and support the research presented. In
particular, during the period 2004 to 2005, colleagues from the Air Traffic Management
(ATM) Group at the Centre for Transport Studies, Imperial College London, assisted in the
questionnaire-based survey of air traffic controllers. This mainly involved the distribution of
questionnaires and collection of the responses.
Furthermore, a key element of the research presented in this thesis is the experiment
conducted at a facility owned and operated by a Civil Aviation Authority (CAA). The
experiment was facilitated by the assistance of various Air Traffic Control (ATC) Centre
staff including ATM specialists, ATC controllers, pseudo-pilots, engineers, and technicians.
Finally, EUROCONTROL staff provided a valuable contribution at various stages of this
research in terms of access to relevant publications, professional networks, and simulation
trials.
I hereby declare that besides the collaborations referred to above, I have personally
carried out the work described in this thesis:
..
Branka Subotic
..
Dr. Washington Yotto Ochieng
ii
Abstract
An Air Traffic Control (ATC) system represents a set of components that act together to
achieve a safe and efficient flow of traffic in any given airspace. The elements of this
system are human operators, equipment, and procedures, along with all the interactions
between them. Failure of equipment, as one component of an ATC system, and its
interaction with human operators (i.e. air traffic controllers) is the main focus of the
research presented in this thesis. Thus, the thesis focuses on the human recovery process
triggered by failure of equipment that support air traffic controllers in the provision of air
traffic services in a dedicated airspace. A detailed understanding of the controller recovery
process has the potential to significantly contribute to safety and operational efficiency in
the current and future ATC environment. Currently, there is a very limited understanding of
the factors that influence the recovery process, particularly with respect to equipment
failures in ATC. This thesis builds on existing relevant research in other industries and
uses targeted experiments and mathematical modelling to develop a functional
relationship between recovery and its influencing factors.
The research presented in this thesis addresses on two areas, namely equipment failures
in ATC and controller recovery. The first investigates the characteristics of the ATC
equipment failures from past research and derives the associated target level of safety.
Linking the target level of safety with available operational failure reports establishes a
means to validate the realism and operational significance of the equipment failure
characteristics. A subset of these characteristics relevant to the ATC operations is further
used to develop a novel qualitative equipment failure impact assessment tool. This tool
enables the identification of equipment failures that are most severe to ATC operations
and thus may be most challenging to controller performance.
iii
Having identified the relevant equipment failure types and their characteristics, the thesis
carries out a critical review of the associated issues regarding the process of controller
recovery. A critical element of this is the review of past human reliability research and its
relationship to controller recovery from equipment failures in ATC. The findings from this
are augmented by questionnaire survey results based on responses of 134 air traffic
controllers from 34 countries. Both the past research and the questionnaire survey results
are used to highlight the importance of the context in which controller recovery
performance takes place and to define the recovery context through a set of 20 candidate
contextual factors or Recovery Influencing Factors (RIFs).
The thesis then uses the candidate RIFs to develop a novel approach for the quantitative
assessment of the recovery context through the concept of recovery context indicator. This
approach and its operational benefits are further validated by an experiment conducted in
a training facility of an ATC Centre with the participation of 30 operational air traffic
controllers. In addition to the verification of the generic methodology for the assessment of
the recovery context, the experimental data are used to analyse controller recovery
performance and investigate the outcome of the recovery process. The findings obtained
from the experimental investigation are in line with those obtained from past research and
the ATC operational environment.
iv
Acknowledgements
technical assistance and unlimited support was crucial to embarking upon the field of
human reliability, completely unknown to me at the beginning of this research. Their
assistance and interest in my research opened many doors and assured the highest
quality of information and professional contacts.
At Imperial College there are many colleagues and research students that offered their
help at various stages and aspects of my work. Among them are Jackie Sime, William
Knottenbelt, Dimitri Panagiotakopoulos, Marie-Dominique Dupuy, Umar Bhatti, Victoria
Williams, and Wolfgang Shuster. However, my biggest gratitude goes to Arnab Majumdar
and to my supervisor, Washington Y. Ochieng. They had a critical role in the support,
supervision, and achievement of excellence in my research. Thanks to their
understanding, I attended various technical meetings, seminars, conferences, courses,
and simulation trials. These proved to be a significant direct and indirect contribution to the
quality of the research presented in this thesis.
One of the critical parts of the research presented in this thesis would not be feasible
without the technical support of the Irish Aviation Authority staff, especially Nick Lowth,
Bernard Mackessy, and Garrett MacNamara. However, my special gratitude goes to Alan
Byrne for making the impossible truly possible and allowing me to complete successfully a
key part of this research and make it complete.
There are many other people that have helped in various ways. I would like to thank Yvette
Dalle-Mule, Veronique Begault, and Sonja Straussberger from EUROCONTROL EEC.
Furthermore, I would like to thank Rajkumar Pant from the Indian Institute of Technology,
Isa Alkalaj and Marek Bekier from Skyguide, Martin Richards and Vic Burgess from UK
NATS, Christopher Adams from Maastricht UAC, Bob Phillips from CASA Australia, Peter
Nalder from New Zealand Civil Aviation Authority (CAA), Jos Kuijper and Randal de Garis
from EUROCONTROL, Sarah Doherty and Joji Waites from the UK CAA, and Keshava
Sharma from the Airports Authority of India.
I want to thank my friend Tamara Pejovic for all the support that she gave me during the
years I have been working on this thesis. Last but not least, I want to express my deepest
gratitude to my brother and my mother who were always the core support in all the
journeys that I have embarked upon. Hence, I am dedicating this thesis to them.
vi
Table of Contents
DECLARATION
ABSTRACT
ACKNOWLEDGEMENTS
TABLE OF CONTENTS
LIST OF FIGURES
LIST OF TABLES
LIST OF ABBREVIATION
ii
iii
v
vii
xiv
xvii
xix
1 INTRODUCTION
1.1 Background to the problem
1.2 Research objectives
1.3 Outline of the thesis
1
1
4
5
8
8
10
11
12
12
13
15
15
18
19
20
22
vii
22
23
24
24
25
28
29
30
31
31
32
34
34
36
37
38
39
41
42
44
45
46
47
49
51
51
53
55
55
56
58
58
60
62
63
65
67
69
69
70
71
71
72
72
73
73
73
74
79
80
82
83
85
85
89
91
95
96
98
100
viii
101
107
109
109
110
110
111
113
116
117
119
121
122
123
124
126
127
127
128
128
130
131
131
131
132
132
133
133
135
135
136
136
137
6 QUESTIONNAIRE SURVEY
6.1 Objectives of the questionnaire survey
6.2 sampling
6.3 Survey methodology
6.4 Design of the questionnaire
6.5 Pilot survey
6.6 Full survey
6.6.1 Face-to-face interviews
6.6.2 Self-completion survey
6.6.3 Potential sources of errors
6.7 Methodology for the questionnaire survey data analysis
6.7.1 Data pre-processing for analysis
6.7.2 Characteristics of the sample
6.7.2.1 Sampling per ATC Centre
6.7.2.2 Sampling of air traffic controllers
6.7.3 High-level analyses
139
140
141
143
144
146
147
147
147
148
149
150
151
154
154
155
ix
156
156
158
163
164
167
168
170
171
171
175
178
187
188
190
191
192
193
194
195
196
197
198
200
200
201
202
204
206
206
207
208
209
211
212
212
213
213
178
180
181
183
183
184
185
186
216
216
217
218
218
221
221
221
222
223
223
227
231
232
232
234
236
237
238
240
241
242
242
243
244
246
249
252
253
256
257
257
259
260
260
263
264
264
266
267
268
268
270
270
271
272
272
274
xi
301
301
301
301
302
303
305
306
307
308
308
308
12 LIST OF REFERENCES
309
APPENDICES
Appendix I
Appendix II
Appendix III
Appendix IV
Appendix V
Appendix VI
Appendix VII
Appendix VIII
Appendix IX
Appendix X
Appendix XI
323
The cost of delays induced by equipment failures
324
326
Interviews with ATM staff
Checklist for the Equipment Failure Scenarios in a specific European 329
ATC Centre - An Aide-Memoire framework
The questionnaire design
341
Example of one questionnaire response
348
Results extracted from question 5 of the questionnaire survey
354
359
Overview of contextual factors
361
Probabilities for 20 Recovery Influencing Factors (RIFs)
Questions for the ATM Specialist
375
Overview of RIFs, their corresponding levels, and designated 378
probabilities
Validation of the RIFs interaction matrix
381
xii
Appendix XII
Appendix XIII
Appendix XIV
Appendix XV
xiii
383
385
402
404
List of Figures
Figure 1-1
Figure 2-1
Figure 2-2
Figure 2-3
Figure 2-4
Figure 2-5
Figure 2-6
Figure 2-7
Figure 2-8
Figure 2-9
Figure 3-1
Figure 3-2
Figure 3-3
Figure 3-4
Figure 3-5
Figure 4-1
Figure 4-2
Figure 4-3
Figure 4-4
Figure 4-5
Figure 4-6
Figure 4-7
Figure 4-8
Figure 4-9
Figure 4-10
Figure 4-11
Figure 4-12
Figure 4-13
Figure 5-1
Figure 5-2
Figure 5-3
Figure 5-4
xiv
7
9
10
14
16
19
23
26
29
31
41
43
46
50
64
81
82
87
90
90
91
92
92
93
96
97
99
105
120
123
124
125
Figure 6-1
Figure 6-2
Figure 6-3
Figure 6-4
Figure 6-5
Figure 6-6
Figure 6-7
Figure 6-8
Figure 6-9
Figure 6-10
Figure 6-11
Figure 7-1
Figure 8-1
Figure 8-2
Figure 8-3
Figure 8-4
Figure 8-5
Figure 8-6
Figure 8-7
Figure 8-8
Figure 9-1
Figure 9-2
Figure 9-3
Figure 9-4
Figure 10-1
Figure 10-2
Figure 10-3
Figure 10-4
Figure 10-5
Figure 10-6
Figure 10-7
Figure 10-8
Figure 10-9
Figure 10-10
Figure 10-11
xv
140
142
146
150
153
155
155
157
157
158
159
182
210
226
226
227
227
229
230
235
241
254
255
258
271
272
273
277
279
280
283
284
286
287
290
Figure 10-12
xvi
295
List of Tables
Table 3-1
Table 3-2
Table 3-3
Table 4-1
Table 4-2
Table 4-3
Table 4-4
Table 4-5
Table 4-6
Table 4-7
Table 4-8
Table 4 9
Table 4-10
Table 4-11
Table 4-12
Table 4-13
Table 4-14
Table 4-15
Table 4-16
Table 4-17
Table 5-1
Table 5-2
Table 6-1
Table 6-2
Table 6-3
Table 6-4
Table 6-5
Table 7-1
Table 7-2
Table 7-3
xvii
49
61
66
70
75
76
76
77
91
92
93
94
94
95
98
99
100
101
101
103
112
126
151
160
165
172
173
186
189
198
Table 7-4
Table 7-5
Table 8-1
Table 8-2
Table 8-3
Table 8-4
Table 8-5
Table 8-6
Table 8-7
Table 8-8
Table 8-9
Table 8-10
Table 8-11
Table 8-12
Table 8-13
Table 9-1
Table 9-2
Table 9-3
Table 9-4
Table 9-5
Table 9-6
Table 9-7
Table 9-8
Table 9-9
Table 9-10
Table 10-1
Table 10-2
Table 10-3
Table 10-4
Table 10-5
Table 10-6
Table 10-7
Table 10-8
Table 10-9
Table 10-10
Table 10-11
Table 10-12
relevant findings
Recovery Influencing Factors
Relevant recovery influencing factors and their corresponding
qualitative descriptors
Overview of CREAM and CAHR differences
Distribution of probabilistic RIF ratings per source
ATM specialists involved in the assessment of RIFs
Overview of the sources of information used to determine RIF
probabilities
Example of a potential recovery context represented as a 20-digit
array
Interaction matrix: (1) validation by CREAM, (2) validation by CAHR,
(3) validation by ATM specialists; and (x) not validated interactions
Mapping between RIFs and CAHR contextual factors
Recovery context (as presented in Table 8-5) after the incorporation
of RIF interactions
Descriptive statistics for the three cut-off points on the example of
RIF5 Level 1
Local minimums of polynomial functions
Cut-off points between the levels for all RIFs
Probabilities for the RIF5 and each of its levels (see Appendix VII)
Sensitivity analysis
Training, pilot study, and experiment sessions
Overview of the potential equipment failures to be simulated and
their inclusion in the pilot study
Equipment failures used in the pilot study
The mapping between exercise characteristics and the controllers
observations
Equipment failure in the experimental study
Availability of functions in the reduced flight data processing mode
Overview of independent and dependent variables
Overview of independent and extraneous variables
Overview and description of required recovery steps
Recovery process and its three main tasks
Characteristics of a sample of controllers participating in experiment
Verification of RIFs probabilities from a generic approach (Chapter
8) and the experiment
Summary of RIFs defined through a single corresponding level
Verification of the distribution of the recovery context indicator
obtained from a generic approach (Chapter 8) and the experiment
A review of RIFs with the potential for recovery enhancement
A review of the proposed recovery solutions
Percentage of performed recovery steps in three experimental
sessions
Comparison of recovery durations between three experimental
sessions
Statistical tests and results
The outcome of the recovery process matrix (S stands for
successful, T for tolerable, and U for unsuccessful recovery)
Statistical tests and results
Summary of additional findings
xviii
201
203
208
212
214
217
218
220
222
225
229
230
230
232
237
244
247
249
257
258
259
259
261
263
265
273
275
277
278
281
282
285
288
289
290
291
299
List of Abbreviations
ACAS
ACC
ADREP
ADS
ADS-B
ADS-C
AFTN
A/G
AGDP
AGL
AIAA
AIS
AMAN
ANSP
APP
APR
APW
ARO
ARTCC
ASAS
ASM
ASMT
ASMT
ASTERIX
ATC
ATCT
ATFM
ATHEANA
ATIS
ATM
ATS
AWOP
BBN
BEST
BEVOR
CAA
CAHR
CATIS
CC
CLAM
CEATS
CFMU
CMS
CNS
COCOM
CORE-DATA
CPC
CPDLC
CPM
CRDS
CREAM
CS
CWP
DARC
DMAN
DME
EASA
ECAC
ECSS
EGNOS
EOC
EOO
EPC
ESA
ESSAR
ET
EU
EUROCONTROL
FAA
FANS
FDPD
FDPS
FIR
FIS
FL
FMEA
FMECA
FMS
FPP
FPS
FT
G2G
G/G
GLONAS
GNSS
GPS
HEART
HEIDI
xx
HEP
HFACS
HEP
HERA
HF
HF DL
HMI
HPDB
HRA
HRMS
IANS
IC
Ic
ICAO
IEC
IEEE
IFR
ILS
IMC
IMC
INS
IP
IRS
ISO
JAA
JAR
JHEDI
M
MAESTRO
MANTAS
MATS
MDT
MET
METAR
Mil
MLS
MMI
MMS
MONA
MORS
MRP
MSAW
MSL
MTBF
MTBM
MTCD
MTTR
MUAC
NATSPG
MTOW
xxi
NARA
NAIPS
NAS
NASA
NATS
NUCLARR
NDB
NLR
NOTAM
NTL
NTSB
OJT
OLDI
OS
PABX
PAR
PARM
PPS
PRA
PRNAV
PRS
Proc
PRS
PSA
PSF
PSR
PTT
QRA
RAFT
RAM
RCP
RDP
RDPS
RDR
RGCSP
RIF
RIMCAS
RNP
RSP
RT
RTCA
RVSM
RVR
RWY
SAR
SAR
SAS
SATCOM
SHAPE
SBAS
SBJ
xxii
SD
SE
SEP
SES
SID
SME
SMC
SMR
SNET
SoL
SOR
SPS
SRG
SRK
SRP
SRU
SSR
STAR
STCA
SUA
SYSCO
TACAN
THERP
TAR
TCAS
TID
TRACON
TIP
TLS
TRACEr
TRACON
TRUCE
TRM
TTA
TWR
TWY
UAV
UHF
UPS
US
UTC
VDL
VFR
VHF
VMC
VOR
VORTAC
VSCS
WAAS
Standard Deviation
Standard Error
Safety and Emergency Procedures
Single European Sky
Standard Instrument Departure
Subject Matter Expert
Surface Movement Control
Surface Movement Radar
Safety Nets
Safety-of-Life
Stimulus-Organism-Response
Standard Positioning Service
Safety Regulatory Group
Skill Rule Knowledge
Single Radar Processing
Safety Regulatory Unit
Secondary Surveillance Radar
Standard Terminal Arrival Route
Short Term Conflict Alert
Special Use Airspace
System Supported COordination
TACtical Air Navigation
Technique for Human Error Rate Prediction
Terminal Approach Radar
Traffic Alert and Collision Avoidance System
Touch Input Device
Terminal Radar Approach CONtrol
Touch Input Panels
Target Level of Safety
Technique for the Retrospective and Predictive Analysis of
Cognitive Errors in ATC
Terminal Radar Approach CONtrol
TRaining for Unusual Circumstances and Emergencies
Team Resource Management
Time To Alert
Aerodrome Control Tower
Taxiway
Unmanned Aerial Vehicles
Ultra High Frequency
Uninterruptible Power Supply
United States
Coordinated Universal Time
Very high frequency Data Link
Visual Flight Rules
Very High Frequency
Visual Meteorological Conditions
VHF Omnidirectional Range navigation system
VHF Omnidirectional Range /TACtical Air Navigation
Voice Switching Communication System
World Aircraft Accident Summary
xxiii
Chapter 1
Introduction
Introduction
The aim of this Chapter is to present the background to the problem of controller
recovery from equipment failures in Air Traffic Control (ATC) and to set the scene for
the research presented in this thesis. This Chapter defines the rationale behind the
need to better understand the impact that equipment failures have on controller
performance in the current as well as in the future ATC environment. Based on this
background, the principle research objectives are defined to assure an in depth
analysis of ATC equipment failures and controller recovery. This is followed by the
specification of the structure of the thesis and a summary of each Chapter.
Chapter 1
Introduction
equipment failures that can occur (NATS, 2002). In most cases, this protection is
triggered automatically and seamlessly. Hence, an equipment failure should not result
in a problem that impacts on the controllers ability to carry out tasks safely, as they
should be automatically resolved with no interruption of the service (EUROCONTROL,
2004e). However, there are occasions when these technical defences are not sufficient
to maintain the normal ATC system state and protect against negative outcomes. On
such occasions, the intervention of the human, as a component of the ATC system, is
necessary. In other words, the intervention of the air traffic controller becomes crucial
for the provision of a safe but not necessarily efficient air traffic service. Note that
safety represents the key driver here as opposed to efficiency.
In the past, major failures or total outages (i.e. failure of the entire system) were the
subject of detailed investigations. These investigations were aimed at resolving and
preventing similar failure occurrences by focusing mostly on the technology (National
Transportation Safety Board, 1996; General Accounting Office, 1982; General
Accounting Office, 1991; General Accounting Office, 1996; and General Accounting
Office, 1998). For a long time, the basic focus of reliability, system safety, and quality
management was purely on the prevention of equipment failures or the reduction of
their reoccurrence. Various techniques have been developed to assess equipment
failures, their causes, consequences, and appropriate defences. For example, the US
Federal Aviation Administration (FAA) requests that the availability of the Voice
Switching Communication System (VSCS) on the level of the ATC Centre (facilitylevel1) should not be less than 0.9999999, including the backup VSCS (FAA, 1997). In
spite of the significant efforts, equipment failures still occur and every ATC system
eventually fails to perform its intended function or part thereof. On these unexpected
occasions, the recovery of the ATC system is left to the human operator to implement
an appropriate recovery strategy in both a timely and effective manner. While past
research focused on the technical aspects of the occurrence of equipment failures,
very little has been done on human factors, with a particular reference to controller
recovery from such failures. Some examples, such as research by Wickens et al.
(1998), Low and Donohoe (2001), and EUROCONTROL (2004e), are discussed in the
following paragraphs.
The facility-level availability is based on a 50-position system. According to the FAA, system
failure occurs when one or more critical functions are unavailable in more than 10 percent of the
positions.
Chapter 1
Introduction
Chapter 1
Introduction
Chapter 1
Introduction
detail in the remainder of the thesis. Based on the background to the problem
presented above, four research objectives have been formulated:
Provide a systematic literature review to connect disparate but related topics of
ATC equipment failures and controller recovery, previously lacking in the area of
ATC;
Identify potential equipment failure types and their characteristics;
Identify contextual factors that affect controller recovery performance and derive
a methodology to quantitatively assess recovery context; and
Propose a framework for the analysis of controller recovery. This framework
should be further verified with a specific reference to a particular equipment
failure type.
Chapter 1
Introduction
Chapter 1
Introduction
Chapter 2
The main objective of the research presented in this thesis is to investigate the
recovery process adopted by air traffic controllers in the event of Air Traffic Control
(ATC) equipment failures. A desirable objective of the research in this thesis is a
framework to analyse controller recovery transferable in time (i.e. to the current and
future ATC Centre). The Chapter contributes to this objective in several ways. Firstly, it
defines the environment for the investigation of equipment failures, i.e. Air Traffic
Management (ATM) and its component ATC. Secondly, it discusses the ATC system
architecture including its specific functional elements. The Chapter proposes a unique
classification of equipment failures based on these functional elements that enables the
capture of all operational components of ATC. This classification is further built upon in
the remainder of the thesis (Chapter 4) to create a qualitative equipment failure impact
assessment tool. Thirdly, the Chapter reviews the characteristics of a generic ATC
Centre with regard to current and future technologies. The potential characteristics of
future ATC Centres are discussed with an emphasis on challenges that face human
operators (i.e. air traffic controllers) due to increasing levels of automation. The
Chapter concludes with discussions on the potential sources of technical and controller
performance deficiencies within future ATC Centres and their relevance to the recovery
process.
minimum
constraints,
without
compromising
(EUROCONTROL, 2006a).
agreed
levels
of
safety
Chapter 2
An ATM system comprises two functionally integrated elements, namely airborne ATM
and ground-based ATM. The airborne ATM consists of several systems integrated into
the aircraft cockpit, such as the airborne Communication/Navigation/Surveillance
(CNS) system, the Flight Management System (FMS), and the Airborne Collision
Avoidance System (ACAS) also known as the Traffic Alert and Collision Avoidance
System (TCAS). The components of ground-based ATM (Figure 2-1) are Airspace
Management (ASM), Air Traffic Service (ATS), and Air Traffic Flow Management
(ATFM) (ICAO, 2001a).
Airspace Management (ASM) is related to the structure and organisation of the national
airspace organised at a strategic (i.e. national ASM policy, planning, and coordination),
pre-tactical (i.e. daily management and temporary allocation of airspace), and tactical
levels (i.e. real-time activation, deactivation, reallocation of airspace, and civil/military
coordination). Air Traffic Service (ATS) is a generic term that combines various
services: the Air traffic services Reporting Office (ARO), the Air Traffic Control service
(ATC), and the Flight Information and alerting Service (FIS) (ICAO, 2001a). The ARO is
a unit established for the purpose of receiving reports concerning air traffic services
and flight plans submitted before flight departure. The ATC component of ATS provides
control of all air traffic in a dedicated airspace. This is discussed in detail in section 2.2
given its importance to the research presented in this thesis. The Flight Information and
alerting Service (FIS) gives advice and information useful for the safe and efficient
conduct of flights. The alerting service provides search and rescue assistance to
aircraft in distress and coordinates any action that may be required. Finally, Air Traffic
Flow Management (ATFM) is a service established to ensure that ATC capacity is
9
Chapter 2
utilised to the maximum extent possible, and that the traffic volumes are compatible
with the capacities declared by the appropriate authority. Optimal flow of traffic is
achieved by continuously balancing the traffic demand and the ability of ATC to
accommodate that demand.
ICAO is the specialised agency of the United Nations concerned with the development of air
navigation and regulation of international air transport.
10
Chapter 2
Airspace is organised into adjacent portions, the so-called sectors, controlled by two or three
controllers, namely executive or tactical controller, planning controller, and assistant or flight
data controller.
11
Chapter 2
potential conflict, managing flight progress strips, and planning the flow of traffic within
the sector. In addition, the planning controller has to assure that traffic enters and
leaves the sector at flight levels and exit points as agreed with the adjacent sectors
(EUROCONTROL, 1999). The assistant or flight data controller ensures that the strip
printer functions properly. In addition, the assistant accepts, processes all received
messages in a timely manner, and passes them to the appropriate position, manually
inputting any tracks for which flight progress strips have not been produced.
The controllers operating in the sectors within an ACC Centre work in close
cooperation and negotiate with each other on aircrafts behalf to optimise efficiency and
ensure safety. The area controllers responsibility terminates when aircraft is handed
over to an adjacent ACC or to an approach control office.
12
Chapter 2
may land or take-off at a time (Nolan, 1998; EUROCONTROL, 1999). In airports that
use multi-runway operations, the aerodrome controller may be responsible for all
runway operations. Otherwise, the responsibility for multi-runway operations may be
divided between a number of controllers. For example, a parallel runway configuration,
where one runway is dedicated to departures and the other to arrivals, requires
separate departure and arrival controller. In this case close cooperation between the
two controllers is essential to ensure a safe operation.
The aerodrome controller is responsible for all traffic operating in the designated area
of responsibility of the control tower. This includes aerodrome circuit traffic, aircraft
landing and taking off, and aircraft and vehicles operating on the manoeuvring areas
(ICAO, 2001a). When good visibility conditions prevail, (i.e. visual meteorological
conditions or VMC), the controller may separate the traffic by visual means and a
reduction in standard separation is permissible. When poor visibility conditions prevail
(i.e. instrument meteorological conditions or IMC) the aerodrome controller works in
close cooperation with the approach controller. In such conditions, prescribed
separation standards must be applied between aircraft in the air.
The surface movement control or ground control (in the US) is a supplementary service
to the aerodrome control service. In less busy airports the aerodrome and surface
movement control functions can be combined and provided by the aerodrome
controller. Otherwise, the surface controller is responsible for issuing taxi clearance
which will take all aircraft to the departure end of the runway (Nolan, 1998;
EUROCONTROL, 1999). In addition, the surface controller is responsible for the
movements of all aircraft and vehicular traffic on the manoeuvring areas of the airport.
ICAO (2001a) defines the manoeuvring areas as any part of the airport used for the
takeoff, landing, and taxiing of aircraft, excluding aprons. Surface movement control is
usually undertaken by visual means. However, in conditions of poor visibility the
controller relies upon surface movement radar (SMR). Working in close cooperation
with the aerodrome controller, the surface controller ensures that all active runways are
free from vehicular activity during aircraft movements.
Chapter 2
come together, a more detailed explanation of the ATC architecture and its basic
functionalities is given below. In line with the objectives of the research presented in
this thesis, this section provides a deeper understanding of ATC functionalities and the
types of ATC equipment that can fail, and therefore affect controller recovery.
ATM
Airspace
management
(ASM)
Flight Information
Service (FIS)
PEOPLE
Controllers
Engineers
Management
Ground-based
ATM
Airborne ATM
(e.g. airborne
CNS, FMC,
ACAS/TCAS)
Air Traffic
Services (ATS)
EQUIPMENT
HMI
Hardware
Software
PROCEDURES &
TRAINING
Operational Procedures
Engineering Procedures
Figure 2-3 ATM and ATC system components (adapted from ICAO, 2001a)
The functional architecture of any system presents a high level decomposition of the
overall system into a logical set of functional blocks. Each block may be further
decomposed into a series of sub-functions. The ATC functionalities and their related
sub-functions, as presented in this thesis, include all those of the current ATM/ATC
system as well those under development for inclusion in the future (i.e. with 2020 taken
as the target year in this thesis in line with the European Commissions Vision 2020;
European Commission, 2001).
The starting point for the development of the ATC functional classification in this thesis
is the EUROCONTROL Harmonisation of European Incident Definition Initiative for
ATM (HEIDI) taxonomy. HEIDI taxonomy identifies six different ATC functionalities and
related ATC equipment that supports each of them. The functionalities listed in HEIDI
are: communication, surveillance, navigation, data processing and distribution, support
information functionality and power supply (EUROCONTROL, 2001e). This taxonomy
is subsequently expanded in this thesis by taking into account the needs for both the
classification and characteristics of the information derived from operational failure
reports processed. The analysis of operational failure reports highlighted the need for
nine ATC functional blocks. . The next set of layers dissects each ATC functional block
14
Chapter 2
into relevant sub-functions which are then dissected further to the elemental level. This
approach enables the capture of all operational components of ATC. The resulting nine
ATC functional blocks, as defined in this thesis, are:
Communication;
Navigation;
Surveillance;
Data processing and distribution;
Supporting;
Safety nets;
Power supply;
Pointing and data input; and
System monitoring and control.
Additionally, this classification is further built upon in Chapter 4. The following
paragraphs give a detailed description of each functionality and the corresponding
physical components (i.e. hardware components that support each function).
15
Chapter 2
16
Chapter 2
exchanged includes flight level information, airspace boundary estimates of flights, and
other conditions that may be agreed between ATC Centres. This category incorporates
both systems for data exchange and any supporting equipment (e.g. AFTN printer,
console).
Thirdly, the Aeronautical Information System (AIS) provides information of a permanent
or semi-permanent nature on subjects such as geographical description of airspace, inflight procedures, sector procedures, communications data, surveillance data, and
specific airport characteristics data, either verbally or via datalink. In addition, local ATC
units provide a dynamic broadcast of relevant information to arriving and departing
pilots in the vicinity of the airport is known as Aerodrome Terminal Information Service
(ATIS). This service uses local weather data (from the meteorological office) and AIS
data (e.g. runway and taxiway conditions, navigational aids status).
Fourthly, backup radio and telephone systems must be provided. These backup
systems may provide identical functionality if it is a duplicated VSCS system. However,
in some cases, redundancy can be provided by similar but not identical systems which
cannot offer identical functionality. In these cases it is essential that controllers are
aware of these differences. Backup communication systems must be capable of
providing continuity of communication during outages (complete loss of the
communications at the level of an ATC Centre), as voice communication continues to
be the primary means of communicating ATC instructions to aircraft.
Finally, several other physical components are listed which have a role in providing the
overall communications function. These include but are not limited to pagers, headsets,
handsets, microphones, processors, press-to-talk buttons (PTT), buzzers, cables, and
footswitches.
The previous discussion has focused on current systems that support the
communication function. Current communication methods are mostly based on
analogue voice communication that pose various limitations to the users (e.g. limited
coverage, accessibility, capability, integrity, and security). Moreover, the combination of
these limitations with current Radio Telephony (RT) procedures is linked to excessive
levels of controller workload (see Figure 21 in EUROCONTROL, 2004g). As a result,
future development of air navigation for civil aviation aims toward enhanced
communication links between aircraft and controllers. This was an important element of
the ICAOs Future Navigation Systems - FANS concept (ICAO, 2007). With respect to
17
Chapter 2
18
Chapter 2
19
Chapter 2
Measuring Equipment (VOR/DME), DME/DME, Non-Directional Beacon (NDB), selfcontained Inertial Navigation Systems (INS), and Global Positioning System (GPS).
Currently, area navigation is primarily supported by ground-based systems. Most
widespread is the VOR which provides a radial or bearing on which aircraft fly from one
VOR station to another (EUROCONTROL, 2003g). This aid is usually combined with
DME providing information on the distance of the aircraft from the VOR/DME beacon.
Therefore, any aircraft utilising this facility, can determine its position in terms of
bearing and distance relative to the location of the VOR station. The VOR/DME
combination represents the primary ground based aid for area navigation. Generally,
the maximum range of VOR stations is in the region of 250nm due to the line-of-sight
nature of VHF signals and the curvature of the Earth (EUROCONTROL, 2003g). Each
air navigational service provider publishes the effective range of their VOR stations.
Another system that uses a radio beacon is a NDB. It consists of two components, the
Automatic Direction Finder (ADF) which represents the airborne component and the
NDB's transmitting unit which is the ground component. The NDB beacon broadcasts
continuously on a specific frequency. An ADF on the aircraft detects specific bearing to
or from an NDB unit and thus determines its position relative to the NDB beacon. A
NDB bearing is a line passing through the station that points in a specific direction (e.g.
270 degrees west). This system may also be coupled with a DME. Although widely
used in the approach environment, it is less accurate and less reliable than VOR/DME
since it is susceptible to interference from thunderstorms and other atmospheric
phenomena. The power output determines the maximum range of the NDB beacon but
generally they are usable in the range of 50-100 Nm (EUROCONTROL, 2003g).
An INS is a completely self-contained navigational system located on board the aircraft
and independent of ground-based navigation aids. The basic INS consists of three
mutually orthogonal gyroscopes, three mutually orthogonal accelerometers, a
navigation computer, and a clock (EUROCONTROL, 2003g). Gyroscopes are
instruments that provide the orientation of an object (e.g. aircrafts angles of roll, pitch,
and yaw). Accelerometers sense a rate of movement or acceleration along a given
axis.
The
orthogonal
accelerometer
configuration
provides
three
orthogonal
20
Chapter 2
position vector of aircraft. These steps are continuously iterated throughout the
duration of the flight. Based on all of the data, the INS system determines the aircrafts
position relative to a known point of departure (i.e. latitude and longitude coordinates of
the departure gate).
In recent years, Global Navigation Satellite Systems (GNSS) are being slowly
introduced where appropriate and cost effective. Two GNSS systems are currently in
operation: the United States GPS and the Russian Federations GLObal NAvigation
Satellite System (GLONASS)3. A third, the European Galileo system, is scheduled to
become operational in 2010. Each of the GNSS systems uses a constellation of
orbiting satellites working in conjunction with a network of ground stations. The GPS
system is available for civil use based on 24 operational satellites. Two distinctive GPS
services are available, namely the Standard Positioning Service (SPS) and the more
accurate Precise Positioning Service (PPS). The SPS is available to the civil users
worldwide without charge or restriction, while the PPS is available primarily to the
military. The SPS requirements are defined through the service availability standard of
more than 99% of time at an average location, with an average accuracy of 34m
horizontal and 77m vertical (95% threshold) (Department of Defence, 2001; European
Commission, 2006a). Similar standards are defined for the Galileo system, where five
distinctive navigation services will be available namely Open Service (OS), Safety-ofLife service (SoL), Commercial Service (CS), Public Regulated Service (PRS), and
Search And Rescue service (SAR) (European Commission, 2006b). The SoL service is
intended primarily for aircraft navigation. Service performance requirements for SoL
with dual frequency correction are set to be 4m horizontally and 8m vertically (95%
threshold) (European Commission, 2006b).
In recent years, additionally to the concept and supporting systems for area navigation,
a new concept referred to as Precision aRea NAVigation (PRNAV) has emerged.
PRNAV has been introduced to allow consistent terminal airspace operations in the
European region (i.e. European Civil Aviation Conference ECAC member states).
This is based on the navigation requirements that procedures, design principles, and
aircraft capabilities should meet the accuracy of 1 Nm for at least 95% of the flight
time (EUROCONTROL, 2006b).
Navigatsionnaya Sputnikovaya Sistema.
21
()
or
Global'naya
Chapter 2
22
Chapter 2
terminal and ground surveillance4. The section concludes with a discussion of the
concept of Required Surveillance Performance (RSP).
Surveillance
Primary Radar
SSR Mode A/
C/S
Display
Surface
Movement Radar
Aux Display
Parallel
Approach
Runway Monitor
Terminal
Approach Radar
Automatic Dependent
Surveillance (ADS)
Precision
Approach Radar
Aerodrome
Traffic Monitor
The primary difference between enroute radars and those used in the terminal and ground
surveillance is the rate of radar information update (e.g. enroute radars update every 8s, whilst
terminal radars update every 5s; EUROCONTROL, 1997).
23
Chapter 2
equipment on board the aircraft known as a transponder. The radar pulses interrogate
the transponder and if the transponder recognises the pulses it will respond by
transmitting back to the radar. Recognition is achieved by a discrete four digit code
assigned by ATC. When the transponder transmits to the radar, it actually transmits
essential data about the flight such as aircraft identification (known as Mode A) and
altitude (known as Mode C). As a result, the combination of the PSR and SSR Modes
A and C or SSR alone provides a three dimensional representation of the traffic. In
addition to this information, Mode S possess a data link functionality and access to
aircraft state vector (ground speed, track angle, turn rate, roll angle, climb rate,
magnetic heading, indicated air speed, mach number) as well as aircraft intent
information or indication of the future path (UK CAA, 2004).
A new surveillance initiative is directed toward the development of Automatic
Dependent Surveillance Broadcast (ADS-B) technology. This is a satellite-based
surveillance system that enables a constellation of satellites to determine the aircrafts
position, altitude, velocity, and other parameters (CASA, 2006). The data is broadcast
to all possible recipients in contrast to Automatic Dependent Surveillance Contract
(ADS-C), where only point to point data transfer is established. As a result, surveillance
in the 2020 time frame is expected to be characterised by a mix of airborne (ADS,
ADS-B,
ADS-C)
and
ground-based
functions
with
increased
functionality
24
Chapter 2
Finally, future development of air navigation for civil aviation is focused on increased
accuracy of the aircraft position by integrating data from all available sources, such as
primary and secondary surveillance signals and Automatic Dependence Surveillance
Broadcast - ADS-B (Mohleji, Lacher, and Ostwald, 2003). The Required Surveillance
Performance (RSP) defines the surveillance requirements according to the airspace
involved (e.g. oceanic/remote airspace vs. high density traffic airspace). In addition, the
ADS system will enable merging of communications, navigation, and surveillance
technologies. This will accelerate the movement toward Airborne Surveillance and
Separation Assurance (ASAS). In other words, the future surveillance technologies
(e.g. ADS) will enable pilots to participate actively in the process of safely separating
their flight from other flights. This will be achieved by the display of traffic information
within the cockpit, wake vortex hazard prediction and avoidance, three dimensional
terrain presentation, terrain avoidance system, and weather awareness (Ochieng,
2006). Moreover, the US FAA is developing a concept of Situational Awareness for
Safety (SAS). The SAS concept is based on the use of available data (e.g. satellitebased position data, terrain, weather) and their exchange between all parties involved
(e.g. pilots, dispatchers, controllers). The primary objective of the SAS concept is to
create an environment promoting more efficient, safe, and free use of airspace (FAA,
1995).
2.3.1.4 Data processing and distribution function
The data processing and distribution function incorporates all systems required to
process flight related data (e.g. initial flight plan data, dynamic communication,
navigation, and surveillance flight data). These include the Flight Data Processing
System (FDPS) as well as the Radar Data Processing System (RDPS) enabling
controllers to 'see' in real-time the movement of aircraft in a dedicated airspace, as
represented on radar display. In addition, this function block also incorporates all
supporting equipment, such as strip printer (Figure 2-7).
25
Chapter 2
Fallback Flight
Data Processing
System
Flight Data
Processing
System
Radar Data
Processing
System
Single Radar
Processing
Supporting
equipment
Fallback Radar
Data Processing
System
Multiple Radar
Processing
The FDPS handles flight plans and updates them through automatic events, manual
inputs, and triggered transitions from one state to another. This life of a flight plan
represents the condition of the flight plan at a specific time in its cycle. The phase of
the flight plan life cycle triggers certain system actions and directly affects what actions
the controller can take on the flight plan and therefore the actual flight. Through the
processing of flight progress strip (either manually or electronically), the controller
manages all traffic by interacting with flight related data (on the radar and auxiliary
display, and strip management board). The FDPS carries out the following specific
processes (EUROCONTROL, 2003a):
initial flight plan processing which includes checking incoming flight plan
messages, creating a record of flight data, and storing it in the flight plan
database. In addition, the FDPS handles flight data throughout the life of the
flight plan by constantly updating and distributing the flight data;
airspace data processing and distribution which handles the complete airspace
information (e.g. airways and navigation beacons). In addition, it processes any
information on the special use of airspace to warn the controller about
infringements which require modification of flight trajectory;
meteorological data processing and distribution;
SSR code management which involves the assignment of SSR code to flights
and identification of all flights by SSR mode A. It also prevents assignment of
duplicate codes;
trajectory prediction which is performed throughout the flight plan life cycle, taking
into account the initial flight plan as well as all modifications of the route;
26
Chapter 2
provision of system supported coordination and transfer of control within the ATC
Centre and between adjacent ATC Centres;
processing of data link messages from/to the aircraft (A/G coordination);
flight plan conflict detection which is performed inside a defined region (i.e.
sector) using flight plan data. This function is known as Medium Term Conflict
Detection (MTCD);
workload monitoring and distribution essential for assisting the supervisor in the
adjustment of the existing sectorisation (i.e. collapse/de-collapse of sectors) and
computation of position/sector load;
arrival sequencing which provides the approach and en-route controllers with a
proposed sequence number for each arrival flight; and
establishment of code/callsign correlation as a mapping between radar tracks
and flight plan database.
A flight progress strip is a tool that controllers use to record the progress of each flight
as it moves through the sector. It represents a record of all ATC instructions given to
each aircraft. It is also used as a back up to the surveillance function in the event of a
failure. The flight strip printer facility, as an additional component in this functional
block, supports the printing of flight strips at the executive, planner, and/or flight data
assistant positions, depending on the suite configuration. This facility automates the
previous manual filling of a flight strip through access to a database of flight information
and a printout of the data when needed. The printed strip displays the non-dynamic
aspects of the flight, necessitating only tactical dynamic instructions to be manually
entered on the strip by the controller.
The RDPS processes radar pictures from all available sources (primary and secondary,
short range and long range, en-route and approach radars) to establish an accurate
picture of all traffic over a well-defined geographical area. In the case of multiple radar
coverage, the RDPS provides a composite air picture of the traffic while taking into
account radar biases for range and azimuth measurements (EUROCONTROL, 2003a).
The ATM surveillance tracker and server system (ARTAS) processes PSR, SSR, Mode
S, and ADS data. These highly accurate and reliable data are directly integrated into
the existing ATC environment by using a universal data exchange format. For example,
EUROCONTROL defined the All Purpose STructured Eurocontrol Radar Information
Exchange (ASTERIX) messaging format. This allows the transfer of information
between two parties (e.g. systems) using a mutually agreed format of data.
27
Chapter 2
The data processing and distribution functional block also incorporates both a fallback
flight data processing system and fallback radar data processing system, as necessary
redundant systems in every ATC Centre. These fallback systems may provide identical
functionality if they are duplicates of the FDPS and RDPS systems. However, in some
cases these fallback systems do not necessarily provide the same range of functions
as the main systems. The necessity of redundant systems in ATC is discussed further
in Chapter 4.
2.3.1.5 Supporting function
The supporting function comprises various ATC tools that enable integrated air traffic
management operations that enhance safety and increase airspace capacity. The main
objective of these tools is to lessen the cognitive workload on the controller while
focusing on the relevant (task specific) information (IFATCA, 2004). They also assist in
the detection and resolution of potential problems. It is important to note that these
tools do not replace the need for controller decision making processes, they simply aid
them. The supporting function includes the following tools (Figure 2-8):
Monitoring tools assist with detection and recording of any safety-related events
(e.g. the Automatic Safety Monitoring Tool ASMT), reduce the workload
associated with traffic monitoring tasks by identifying the potential and actual
deviations or non-conformance with the planned flight trajectory (e.g. MONitoring
Aid MONA), and automatically check if aircraft are adhering to their planned
route (e.g. Route Adherence Monitoring RAM) or cleared flight level (e.g.
Cleared Level Adherence Monitoring CLAM) by comparing planned or
cleared information with the aircraft actual position (EUROCONTROL, 2001f);
The Medium Term Conflict Detection (MTCD) system is a tool which enables
controllers to predict and identify future conflict between aircraft in the predefined
region by applying separation rules (EUROCONTROL, 2001f); and
Sequencing managers (e.g. Arrival Manager - AMAN, Departure Manager DMAN, Means to Aid Expedition and Sequencing of Traffic with Research and
Optimisation - MAESTRO) are decision making tools for providing the approach
and en-route controllers with the control and sequencing actions to properly
expedite traffic to the destination airports and runways (EUROCONTROL, 2001f).
28
Chapter 2
These tools aim to enhance the controllers appreciation of the current and predicted
traffic situation and facilitate the decision making process. They are an integral part of
the HMI (i.e. radar display) and are informed by the output of the data processing and
distribution function.
2.3.1.6 Safety Nets
A safety net (SNET) is an airborne and/or ground-based function informing the pilot or
controller to the imminent possibility of collision between aircraft, between aircraft and
terrain/obstacles, as well as penetration of dangerous airspace (IFATCA, 2004). The
most common safety nets are Short Term Conflict Detection (STCA), Minimum Safe
Altitude Warnings (MSAW), Area Proximity Warnings (APW), and Runway Incursion
Monitoring and Conflict Alert System (RIMCAS).
The previous section described medium term conflict detection (MTCD) as an ATC tool
which assists the controllers in early detection and prediction of conflicts (e.g. 20
minutes in advance). Similarly, the STCA function detects two system tracks predicted
to be in conflict (i.e. two tracks where both horizontal and vertical separations are about
to be compromised). This system then alerts the controller to the imminence of a
separation minima infringement through the display of visual alarms presented on the
affected traffic on the HMI. However, whilst MTCD is for early detection and prediction
of conflicts, the STCA is used as a safety net or defence against imminent conflict
(EUROCONTROL, 2007a). The exact moment of STCA alarm depends upon
29
Chapter 2
predetermined settings (usually it is set to trigger the alert between 90 seconds and two
minutes prior to conflict).
The MSAW function enables detection of a radar track predicted to infringe the
minimum safe altitude above an obstacle. MSAW processing takes into account the
track altitude (i.e. altitude of the track extracted from Mode C or present altitude
corrected for pressure at mean sea level known as QNH pressure, thus providing the
altitude above mean sea level), attitude indicator (i.e. climb or descent), position and
speed vector. In addition, the system will detect if a radar track is predicted to deviate
from the approach path of an airport (EUROCONTROL, 2007a).
The APW is used to designate areas which are dangerous for an aircraft to enter (e.g.
missile firing, military training, and air display areas). These areas can be identified as:
prohibited, restricted, dangerous, military training, segregated, special use, temporary
restricted, and permanently restricted. The APW ensures that any aircraft infringing or
predicted to infringe on one of these areas is detected by this system and an advance
warning is presented to the controllers (EUROCONTROL, 2007a).
RIMCAS is an airport monitoring and conflict alert system which detects and alerts
controllers before a runway incursion is about to occur. The system gives the controller
an opportunity to react within a realistic and effective timeframe. This system is also
known as the ground short term conflict alert system. The main requirement of this
system is to be supplied with reliable surveillance data as any false alert unnecessarily
increases controller workload. As a result, the Automatic Dependent Surveillance
Broadcast (ADS-B) system should enhance surveillance capability for airport
monitoring and conflict prevention through the Advanced Surface Movement Guidance
and Control Systems (ASMGCS) (ICAO, 2005).
2.3.1.7 Power supply
The availability of electrical power is a prerequisite in a computer driven environment,
such as an ATC Centre. Electrical power is obtained from public utilities, but in case of
interruptions or non-availability, the ATC Centre's own installations are required to
provide electrical power. This is most commonly achieved by diesel-powered
generators or powerful batteries, supporting an Uninterrupted Power Supply (UPS)
capability. These components are required to provide uninterrupted electrical power
supply in order to prevent computers shutting down.
30
Chapter 2
The data recording and playback facility enables automatic recording of all transactions
made by the radar data, flight data, radar display, and communication functions. This
includes all controllers modifications to flight plans, received messages, and display
setting modifications (EUROCONTROL, 2003a). The recorded data are used for further
data analysis and for playback of the specific air traffic situation (i.e. in the case of an
31
Chapter 2
incident). The recordings are stored on disks for the time deemed necessary by the
relevant aviation authority (the legal requirement is 30 days but could be longer if
necessary for incident investigation).
One of the most requested system control and monitoring functions is the ability to
detect faults in the supervised ATC system by continuous control and monitoring of the
system operation. This facility provides detailed information on the equipment states
within the managed systems and the relevant alarm conditions which may affect the
operating mode. It also logs events and enables the remote control of supervised
equipment and setting of the system thresholds (EUROCONTROL, 2003a). Its main
sub-functions are: fault management (i.e. alarm management, threshold setting),
configuration management (i.e. equipment descriptions), performance management
(i.e. identification of trends and problems), and security management (i.e.
authentication, identification, password protection, tailored user interface). The control
and monitoring is performed on all positions, external lines, and connections.
Each ATC system is designed to have several operational system modes
(EUROCONTROL, 2003a). These modes automatically switch-in if any of the major
processing systems fail. The objective is that the controller always has some
functionality available despite the degradation of equipment. Reduced radar, alert, flight
plan, and communication modes are the most frequent types of reduced operational
modes available in current ATC systems.
The time management facility uses the external time received from the GPS signal for
synchronising time on all computers (i.e. all Controller Working Positions - CWPs). The
time is expressed in Coordinated Universal Time (UTC), also known as zulu time.
Originally, it was a time scale based on the local standard time on the 0 longitude
meridian which runs through Greenwich, United Kingdom. Today, UTC uses precise
atomic clocks and satellites to ensure a reliable and accurate time standard for air and
ground operations (ICAO, 1979).
Chapter 2
The following section focuses on technologies that will determine the characteristics of
the generic ATC Centre in the future.
There are significant variations in equipment between ATC Centres, both in Europe
and worldwide. On the European level, EUROCONTROL, the European Organisation
for Safety of Air Navigation, took the role of promoting the harmonisation, integration,
and standardisation while improving safety and overall performance of the ATM/ATC
systems in its member states. For example, EUROCONTROL (2006d) has considered
the costs of fragmentation of the EUROPEAN ATM system. At a global level, ICAO
standardisation activities are undertaken when new systems or technologies are
mature, have demonstrated their ability to provide safety enhancements compared to
existing systems, and are cost beneficial to international civil aviation (ICAO, 2003).
ICAO has established standards and recommended practices for all of its contracting
states (ICAO, 2006b).
In spite of the significant effort to date to standardise ATM/ATC within the aviation
community, there are still significant differences. For this reason, the methodology
adopted in this thesis for the assessment of controller recovery from equipment failures
in ATC is designed on the basis of a generic ATC Centre. This is defined below.
The ATC Centre should be based on a fully automated and integrated system with a
fail-safe design based on duplicated processors and open architecture in accordance
with existing industrial standards. It also has to have graceful degradation modes. The
data processing functional block should be able to support acquisition and processing
of data from several radars (i.e. multiradar tracking), automatic collection and
processing of flight plans, automatic allocation of SSR codes, coordination achieved
through direct connection to adjacent centres (e.g. on-line data exchange - OLDI),
coordination of civil and military flights via a separate military suite, and automatic flight
progress monitoring (continuous calculation of flight profile and update based on radar
data). The air situational picture should be presented on the HMI (radar and auxiliary
display) with necessary alert facilities (e.g. STCA, MSAW, CLAM, RAM). The playback
function of radar pictures should be available for incident investigation, testing,
development, and training.
The ATC Centre should have the capability to have paper strip presentation on the strip
console. A flight progress strip is a single strip of paper that contains all information on
a flight and its evolution through a particular sector of airspace. It is used as a quick
33
Chapter 2
way to record the progress of the flight and to keep a legal record of the instructions
issued. It is also used to allow the planning controller to predict future conflicts and to
ensure that sector entry/exit conditions are achieved. In addition, in the case of radar
failure, flight progress strips represent the primary control tool. The strip, mounted in a
strip holder, is placed with other strips in a 'strip board' which displays all flights in a
particular sector of airspace or on an airport.
In recent years, there have been initiatives aimed at electronic strip presentation, used
in many European ATC Centres and airports. However, as Lanzi and Marti (2001) point
out, controllers do not generally find electronic strips to have the same level of flexibility
and support as paper strips. On the other hand, more radical attempts have been made
toward a stripless environment, where aircraft information is tagged to the label on the
radar screen that can be expanded as necessary. In this environment generally three
modes of the same aircraft label exist: the standard label that is always displayed on
the screen, the highlighted label that is bigger and contains more information, and the
extended label that contains all information not immediately required by the controller
(for details see Lanzi and Marti, 2001).
The previous sections have discussed the current technologies relevant to an ATC
Centre. This forms a part of the definition of a generic ATC Centre. In addition, the
generic ATC Centre should be adaptable to changes in technologies. Hence, the
following section addresses the future of ATC and how this is likely to impact on an
ATC Centre.
Chapter 2
Factors in Air Traffic Control Automation (Wickens et al., 1998) defined automation as:
a device or system that accomplishes (partially or fully) a function that was previously
carried out (partially or fully) by a human operator.
According to Wickens (1992) automation is mainly applied to perform or assist
functions in which humans are naturally limited (e.g. accessibility to toxic, dangerous,
unreachable environments; or inherent working memory limitation). In addition,
automation is used to replace humans in operations which are time consuming, costly,
or induce high workload (e.g. complex monitoring or analytical processes). While often
seen as replacing humans, in reality, automation changes the role of the human
operator from direct manual control to largely supervisory control. In other words, in this
new role, the human operator plans and inputs tasks and the computer systems
implement these tasks automatically. Automation does not totally replace human
activity, it just changes the nature of the work that humans do. This change is often
completely unintended or unexpected by automation designers (Parasuraman and
Riley, 1997).
Past research has identified three sources of human performance deficiencies when
using high level automation (Bainbridge, 1983; Wickens et al., 1998; Wiener and Curry,
1980; Boehm-Davis et al, 1983). Firstly, humans become less likely to detect failures in
the automation itself or in the automated process. Secondly, they lose some
awareness of the state of the automated process. Finally, human operators eventually
lose skills in performing the actions manually if these actions have been previously
automated. These three phenomena are commonly known in literature as out of the
loop performance problems. This problem of deterioration of manual skills is
particularly relevant to controllers and flight crews. As Bainbridge (1983) points out, an
irony is that the more reliable the automation, the more prone to out of the loop
performance problems will be the operator. This is the direct result of the increased
complacency, over trust in automation, and deterioration of manual skills of both
controllers and pilots.
Experiments have shown that operators abilities to recover from emergency
automation failure significantly improve with levels of automation that require human
involvement in the implementation of a task. Thus automation strategies that allow
operators to focus on current operations may contribute to improved situational
awareness and reduction in workload (Endsley, 1997). As a result, a new approach to
35
Chapter 2
36
Chapter 2
37
Chapter 2
38
Chapter 2
emergency situation will decrease with decreased separation, while the operator
response time may increase due to out of the loop performance. One alleviating factor
may be the transfer of responsibility for separation management from controllers to
pilots, giving the former more time to affect recovery. The environment of collaborative
decision-making and real-time information exchange though threatens to distribute
false or inaccurate information from the ground to the air. In this case, ATC equipment
failure may affect the airborne segment of ATM and cockpit instruments (e.g. Flight
Management System - FMS).
The European Organisation for Safety of Air Navigation (EUROCONTROL) recognised
that the role and nature of controller tasks will change as a result of the addition of
increased automation within the ATM system. As a result, they initiated the Solutions
for Human-Automation Partnerships in European ATM (SHAPE) project to better
understand interactions between automated support and controllers (EUROCONTROL,
2004f). SHAPE has identified seven factors that need to be addressed to ensure
harmonisation between automated support and the controller. Amongst factors such as
trust, situational awareness, team issues, skills, ageing, and workload, SHAPE
recognised the importance of managing system disturbances (details are presented in
Chapters 5 and 7). As a result, the assessment of controller recovery presented in the
remainder of this thesis, considers the interactions between human and automation. A
flexible approach has been developed to assess controller recovery in any possible
context.
In short, the role of the human operator will remain significant in the future ATC
environment. Due to the transfer of responsibility for separation management from
controllers to pilots the recovery performance will evolve from purely controllers
actions to collaboration between controller and pilot. To support human performance in
the future more automated environment (both on the ground and in the cockpit), special
attention will have to be given to the areas of human-computer interaction, training, and
procedures for both normal and abnormal situations.
2.6 Summary
The aim of this Chapter is to create a basis for the research on recovery from
equipment failures in ATC. There are several findings that will be taken forward from
this Chapter. Firstly, this Chapter defined ATM and its component ATC and thus
indicated the scope of the research presented in this thesis. Secondly, this Chapter
placed additional emphasis on the ATC functional classification. This classifications
39
Chapter 2
starts with the main ATC functional blocks further dissected to element level. It has
been defined based on both current and future ATC systems and tools in accordance
with principles and initiatives of ICAO and EUROCONTROL. As such, this ATC
functional breakdown is flexible to changes in ATM/ATC and should capture both
current and future equipment failure types. Finally, this Chapter defined characteristics
of a generic ATC Centre in both current and future ATC environment. This finding
creates a base for the entire research presented in this thesis.
The next Chapter focuses more on the equipment component of the ATC system.
Since the aim of the overall thesis is to assess the impact of equipment failures, the
next Chapters provide relevant definitions, identify types of equipment failure, and their
contribution to the safety of the overall air transport system. A sample of operational
failure reports used in this research is validated through a framework based on the
contribution of equipment failures to the overall safety of air transport system.
40
Chapter 3
Preliminary Assessment
3
Preliminary Assessment of Equipment Failures in
Air Traffic Control
The previous Chapter presented the context of the research in this thesis by describing
the Air Traffic Management (ATM) system and its component the Air Traffic Control
(ATC) system. Furthermore, it detailed the range of functions provided in an ATC
Centre. The main characteristics of current ATC Centres as well as the concepts
shaping their future characteristics were covered also. A comprehensive analysis of
equipment failure should follow its life by assessing all the phases that this occurrence
undergoes throughout the ATC system (Figure 3-1). An equipment failure firstly
encounters the existing technical built-in defences. If these inherent defences are
insufficient to prevent the failure impacting on the ATC system, the failure now
becomes a hazard. Hazards represent a sub-group of equipment failures that penetrate
existing technical built-in defences and hence require human intervention (or human
recovery). An equipment failure occurrence concludes with the outcome which is the
result of the collaboration between technical and human recovery.
Following the equipment failure life, the Chapter starts with the relevant definitions of
equipment failures and hazards. While the human recovery and outcome phases of the
equipment failure life are discussed in the remainder of the thesis, this Chapter
continues by presenting the available sample of operational failure reports. It also
discusses the reporting schemes used to obtain equipment failure reports and data
pre-processing issues. The appropriateness of this sample is assessed by using a
41
Chapter 3
Preliminary Assessment
methodology that determines how much ATC equipment contributes to the safety of the
overall air transport system. Agreement between the findings obtained from past
research and the analysis of available operational failure reports indicates the validity
of this sample. Once this is achieved, the thesis continues with more in depth
assessment of the available sample in the following Chapter.
42
Chapter 3
Preliminary Assessment
PEOPLE
EQUIPMENT
PROCEDURES
& TRAINING
FAILURE
HUMAN FAILURE =
HUMAN ERROR
EQUIPMENT
FAILURE
Equipment
failure
FAILURE OF
PROCEDURE AND/
OR TRAINING
Failure mode
Outage or
Fallback
43
Chapter 3
Preliminary Assessment
UK national air navigation service provider (NATS) differentiates between fallback and
failure modes. According to NATS, fallback mode is a condition which occurs only if
there is a major failure or when the level of redundancy is significantly eroded (NATS,
2002). Thus, the NATS definition of fallback modes corresponds closely to outages
defined previously.
It is very important to distinguish between equipment failures and human operator
failures, known as human errors (Figure 3-2). Note that it could be said that all failures
are human in their nature, since most of them involve humans at some stage of the
process, e.g. system designers might fail to anticipate a certain equipment state.
Humans are also involved in manufacturing, testing, validation, certification, and
maintenance. Any of these human operators can be directly or indirectly responsible for
a failure occurring in ATC. It is also important to note that non-technical failures should
not be directly considered as human failures. Frequently, a failure that has no obvious
technical cause is directly attributed to the human, due to a lack of a deep and
objective analysis of its causes and dynamic relations between technical and human
components of the system (Straeter, 2001).
The following sections start with the definition of a hazard, as a sub-group of equipment
failures that penetrate existing technical built-in defences and hence require human
intervention, which is the focus of the research presented in this thesis. This is followed
by the presentation of the sample of operational failure reports available in this thesis.
44
Chapter 3
Preliminary Assessment
The following examples may help to clarify the difference between failure, hazard,
technical and human recovery, as defined in this research:
A power loss (failure) affects one set of Controller Working Positions (CWP).
Due to the independent Uninterruptible Power Supplies (UPS) electrical energy
is continuously provided and the controller does not notice this failure (no
hazard). The automatic changeover to UPS represents one example of built-in
technical defence or technical recovery (see Chapter 4 for detailed
explanation). If the continuous supply of electrical energy is not provided,
several CWPs may experience a problem, creating a hazardous situation and
requiring controller intervention (human recovery).
It should be pointed out that although this research considers only failures which lead
to hazardous situations, there are other failures as well. These other failures represent
the majority which never affect the controllers performance due to the effectiveness of
technical built-in defences (NATS, 2002). However, these failures still require
intervention, repair, and maintenance by engineers from the ATC system control and
monitoring unit.
After defining a failure and hazard as used in this research, the next session analyses
the nature of equipment failures in the operational environment. Details on this sample
of equipment failure reports are presented in the following section.
45
Chapter 3
Preliminary Assessment
46
Chapter 3
Preliminary Assessment
database of the control and monitoring unit within the particular ATC Centre. This
database must contain information on all equipment failures that occurred in the ATC
Centre regardless of their impact or severity. The reason for this is because
engineering staff have to have a complete insight on all equipment failures as they are
responsible for repair and maintenance.
However, not all equipment failures are required to be reported at a national level. The
choice of those that need to reach respective CAAs is made through a review of
reported incidents or safety events on a monthly, quarterly, and annual basis. As a
result, a national database will contain only occurrences of appropriate severity
characteristics and impact on operations. As an example, the UK CAA uses a MOR
database which contains, amongst others, reports on equipment failures that impact on
the controllers ability to provide air traffic services. These reports are fed in from the
Engineering Reporting Occurrence Database which contains details on all technical
problems, failures, and maintenance issues, of which the majority pass unnoticed by
controllers (due to the high level of ATC systems redundancy).
Collected data is regularly analysed to assess the safety performance at national level
as well as at the level of the relevant units (e.g. ATC Centre). Furthermore, this
information is sometimes used on a wider basis for benchmarking studies and to record
the safety performance of a given region (e.g. European Civil Aviation Conference
ECAC consisting of 41 European countries).
47
Chapter 3
Preliminary Assessment
A lack of reporting culture that results in uncertainty related to data reliability and
completeness.
These problems are addressed below highlighting the approaches adopted to mitigate
them.
All reports have a short, one sentence long, summary followed by a description of the
equipment failure incident plus some additional information (e.g., date, occurrence
number, location, area code: flight information region or sector name). Unfortunately
the additional information were not always available. Additionally, Countries C and D
provided their internal severity categorisation, while Country D provided information on
failure duration. Since Country Ds dataset originates from an engineering unit, the
duration variable was measured from the first log of the failure until its final resolution.
As a result, it was possible to consistently extract four types of information. The type of
equipment/ATC functionality affected and complexity of failure type are extracted
usually from the short summary available for each report. The severity of equipment
failure is extracted using the available severity rating (if it existed) or assessing the
available information of the operational and safety impact of equipment failure and thus
applying the severity rating derived in this research (see Chapter 4, Table 4-5). Finally,
the duration variable is available only in the Country D database.
Data pre-processing is based on the classification of ATC system functionalities (see
Chapter 2). In certain reports it was very difficult to determine the type of equipment.
This problem was compounded by having only an acronym to explain precisely what
the report referred to. Consequently, several interviews have been conducted with
engineering staff from two European ATC Centres to correctly identify and classify
those ambiguous problems and assure proper classification. A glossary of terms and
acronyms is found to be a very useful tool during the pre-processing stage. Such
documents should accompany (or be an integral part of) every database as part of a
normal reporting practice.
Within one country, the number of reports may not reflect the actual number of
equipment failure incidents in the ATC Centres for a variety of reasons. The main
reasons may be the lack of reporting as a result of an inadequate reporting culture in
the ATC Centre and aviation community overall. Secondly, not all equipment failures
are included in the CAA databases. As previously explained, only failures of certain
48
Chapter 3
Preliminary Assessment
severity (i.e. impact on ATC operations and controller performance) tend to be reported
to the CAA. As a result, the available operational failure reports are neither necessarily
complete nor reliable (i.e. they lack the detail on the context surrounding a reported
occurrence). To date, no measure of completeness and reliability of occurrence
databases has been produced. This is a task for future research.
Table 0-1 Summary of available data, number of reports, and equipment failure incidents per
country
Average flight
hours flown for
available time
period
Total number
of reports preprocessed
Total number of
equipment
failures
reported
Country
Source of data
Time period
available
CAA
1999-2003
1,375,800.00
1,378
791
CAA
2001-2005
1,027,870.00
1,393
1,324
CAA
1992-2004
389,245.68
3,340
448
System control
unit/ATC Centre
08/2000-2004
428,502.22
16,697
7,788
22,808
10,351
Total
After pre-processing of all available equipment failure reports (22,808), more than ten
thousand reports (i.e. 10,351) are identified as equipment failures in air traffic control
(Table 3-1). The remaining reports mainly comprised of equipment related reports
outside of the national airspace, multiple reports filed for the same occurrence to reflect
multiple finding or causes identified, as well as reports on non-ATC equipment and
other non-technical types of incidents (e.g. human error, runway closures due to nonequipment issues, scheduled maintenance, software updates, and scheduled hardware
changes).
49
Chapter 3
Preliminary Assessment
The time period studied, for countries A and B, could be considered steady (uniform)
with respect to the ATC service provided and other aviation related factors (e.g. traffic
levels, jet fuel prices, airline fares, regulations). However, one modern ATC Centre was
opened in Country A in the second half 2001. This resulted in a relatively large number
of early failures of individual components early in 2002. This is a recognised
characteristic of the initial life or burn-in period of any newly implemented system
(Figure 3-4).
Figure 0-4 Bathtub model of reliability for electronic components (Leveson, 1995)
Country B underwent a complete modernisation of its ATM system in 2000. Given that
a typical burn-in period range between 30-90 days (IEEE, 1998), it is reasonable to
assume that the system was well integrated and settled for the period of the data (i.e.
2001 to 2005). Therefore, the average number of incidents reported in this period could
be considered representative and appropriate for further analysis.
However, the time period available for Country C consists of 13 consecutive years (i.e.
1992 to 2004). This country went through extensive regulatory changes throughout the
1980s. The change in air service licensing assured that any operator that could prove
financial viability and meet safety standards would obtain a license. As a result, by the
end of the 1980s, the number of operators had more than doubled. At about the same
time, the Government decided to commercialise most of its service provision activities.
Thus air traffic and other services formed new state-owned commercial enterprises.
However, all of these changes were firmly embedded into the system until the 1990s,
and therefore, the sample provided could be considered stable and appropriate for
further analysis.
Country D is unique in that it provided data from a single engineering unit database and
therefore represents the most detailed data source in this research. It covers the
50
Chapter 3
Preliminary Assessment
shortest period available (3.5 years) but contains the highest proportion of failures or
75 percent of all available reports.
Although the available sample has a significant number of operational failure reports,
this still does not indicate how representative these reports are of the operational ATC
environment. For this reason, a methodology for the top down total aviation system
safety is developed. This methodology enables determination of the contribution of
ATC equipment to the safety of the overall air transport system based on past
research. Once this is established, the same methodology is applied using the
operational failure reports and then the results are compared. This methodology and
the subsequent validation of the available operational data are presented in the
following section.
51
Chapter 3
Preliminary Assessment
this context, past safety analyses (not only in aviation) have used the number of
incidents together with the assumed accident/incident ratio. The United States Federal
Aviation Administration (FAA, 2000) cites several different analytical approaches. The
two most common of these are discussed below.
In the 1940s, Heinrich introduced the idea of the existence of accidents where injuries
did not occur, but considered only damage to property (Heinrich, 1941). This led to the
creation of the so-called Heinrich pyramid with established proportions of accidents,
serious incidents, and incidents; 1:29:300 (Saldana et al., 2002). After these initial
studies, there was stagnation in the theoretical underpinnings of safety investigations
until the practical work of Byrd in the 1970s. Byrd carried out his work in a steel factory
and revised Heinrichs proportions to 1:29:600 (Saldana et al., 2002).
However, whilst both of these studies are valuable in their statistical analyses, they do
not seem to be appropriate in dealing with equipment failures in ATC, at least not in the
ratios they offer. Both studies are designed to determine the risk and related ratio of
on-the-job accidents and incident. The reason for the weaknesses in both studies may
originate from their design and in particular, the bias of analysing accident reports filed
by supervisors only (which tend to blame injuries on workers) and much lower levels of
equipment reliability and integrity compared to the systems used in ATC today.
For the purpose of the research presented in this thesis, additional attention has been
given to the ratio between accident and incidents induced by ATC equipment failures.
However, a EUROCONTROL safety assessment study assumed that one in 10,000
equipment failures will contribute to an aviation accident (EUROCONTROL, 2004c), an
assumption which is in line with the high reliability requirement for the overall ATC
systems, as well as ATC equipment. A number of arguments can be made to suggest
that in future, this proposed ratio will decrease:
The number of incidents should decrease due to continuous safety initiatives and
hazard prevention programmes;
The probability of an incident leading to an accident should decrease due to
increases both in equipment reliability and advanced solutions for redundancy
and diversity (dissimilar redundancy);
Changes should be seen in the type of incidents occurring, in that as a result of
enhanced risk management approaches, the frequency of serious incidents
should reduce;
52
Chapter 3
Preliminary Assessment
53
Chapter 3
Preliminary Assessment
per time period of interest. Any other measure would mask the true performance
values.
In addition to the units of measure, accident rates are determined by the definition of
the critical event as well. These critical events range from accidents, fatal accidents,
hull losses, to the number of fatalities or injuries. An accident, as defined by ICAO
Annex 13 (ICAO, 2001d), involves an occurrence associated with the operation of an
aircraft, which takes place between the time that any persons board the aircraft with the
intention of flight and that all such persons have disembarked, in which any person
suffers death or serious injury, or in which the aircraft receives substantial damage.
This definition therefore comprises fatal accidents as well as hull losses. Thus, in
dealing with various accidents rates it is crucial to be aware of the precise definition of
both the critical event and the unit of measurement used.
The current rate of aircraft accidents per million flying hours has remained constant
over recent years. If the same accident rate is assumed for the future together with
predicted increases in traffic levels, there will be an increase in the absolute number of
accidents. Using the current accident rate, ICAO has predicted that by the year 2010
there will be an aircraft accident per week, i.e. 52 accidents per year (Hai, 2004). This
is the reason why the US FAA and other aviation authorities have identified the need to
significantly decrease the risk of aircraft accidents.
The following sections propose a methodology for the derivation of aviation target level
of safety (TLS) based on the rate of aircraft accidents (defined as a number of
accidents per flight hour). An accident is defined according to ICAO, while the flight
hour has been chosen as the most appropriate measure of risk induced by equipment
failures. It is usually more convenient to work in terms of flight hours rather than
operational hours of an ATC unit or sector. This approach avoids difficulties and
differences associated with the geographical coverage of the system(s) being
considered, phase of flight, the density and complexity of airspace, as well as available
systems and equipment (e.g. number of radars, navigation systems, communication
systems). This is also in line with Required Communication, Navigation, and
Surveillance Concepts (RNC, RNC, RSC) as defined in the previous Chapter. In short
the proposed methodology starts by identifying the high-level aviation target level of
safety further focusing on the precise contribution of equipment failures, as the type of
occurrence under investigation in this thesis.
54
Chapter 3
Preliminary Assessment
Note the difference between acceptable and tolerable risk. Tolerability refers to a willingness
to live with a risk so as to secure certain benefits and in the confidence that it is being properly
controlled. Tolerable risk, is not ignored, but is controlled and reduced further if possible. On the
other hand, acceptable risk means that we are prepared to take risk as it is (Reid, 1996). It
should be noted also that acceptable risk is a relative term and is based on different risk
perceptions: individual, public (group of individuals), industry (industry usually needs additional
pressure to declare a product as unsafe), and risk perception by safety experts. They all differ in
the level of risk they are willing to accept.
55
Chapter 3
Preliminary Assessment
acceptable risk for each sub-category, so that each one has to produce equal or lower
risk than prescribed (see Figures 2-1 and 2-3).
As pointed out by Brooker (2004), there are several methods to derive the TLS. In most
cases, the analysis starts from the current situation and uses an improvement factor to
derive the desired TLS. In some cases, this improvement factor may be established as
a continuing trend from the past translated into the future. It should incorporate traffic
growth factors, factors representing changes in the systems involved, the operational
procedures, and work practices. In other cases, it may be based on a common
agreement between technical experts, with the main idea underlying it being to set
challenging, but still realistic safety improvement targets.
The following sections provide an overview of the most relevant aviation TLS analyses.
The level of diversity between these approaches highlights the complexity of the
problem and the need for a consistent top-down total air transport system approach.
3.4.3.1.1 Joint Aviation Authority
The Joint Aviation Authority (JAA) document JAR-25.1309 is one of the main regulatory
documents in aviation. It also defines the fundamental principles that govern aircraft
design and certification. JAR 25.1309 defines the risk of a serious accident due to
operational and airframe-related causes to be in the order of one per million hours of
flight. About ten percent of the number of accidents related to operational and airframe
causes is attributed to aircraft equipment failures (e.g. hydraulics and electrical
systems) and the rest (90 percent) to other operational aspects (JAA, 1994). A
EUROCONTROL review of existing TLS standards and practices (EUROCONTROL,
2000a) argues that this requirement is based on data from the 1960s and as such is
outdated.
Furthermore,
the
JAR
requirement
is
related
to
aircraft
design,
encompassing only aircraft equipment, without consideration for the other components
of the air transport system (including ATM). Accordingly this JAR requirement needs to
be informed with all the major changes in the aviation industry since the 1960s. The
following paragraphs indicate several key factors that symbolise the changes and
growth in aviation since the 1960s.
There has been a rapid expansion in the air transport industry over the last four
decades due to a number of factors, including growth in the world economy,
advancement in flight technology and the deregulation of the airline services. The result
of these forces has been a steady decline in airline costs and passenger fares, which
56
Chapter 3
Preliminary Assessment
has further stimulated traffic growth. As an example of economic growth, ICAO cites
that there has been an increase in total gross domestic product (GDP) by a factor of
3.8 over the same period (ICAO, 1997). The GDP is considered to be the most
appropriate available measure of world output and indicates the health of the global
economy.
Changes in flight technology have also had a major effect on the growth in travel
demand. The modern era of air transportation began in the 1960s. The major drive was
the replacement of piston engines with jet engines, which was accompanied by
increased speed, reliability, and comfort. This change led to a reduction in operational
costs, which in turn led to increased travel demand.
In addition to this, changes in the regulatory environment in both the US and Europe
have had a big effect. The deregulation of airline services in the US in 1978 allowed
airlines to improve services, reduce average costs, increase routes, and increase
efficiency of scheduling. In Europe, the introduction of a single market for aviation
services by the European Union in 1992 has seen similar changes to that seen in the
USA.
The ICAO Manual on Air Traffic Forecasting (ICAO, 1985) suggests three methods for
forecasting future civil aviation traffic. These methods are trend projection, econometric
analysis, and market and industry survey. Econometric forecasting is the only method
that takes into account various economic, social, and operational factors affecting air
traffic. The objective here is to translate the relevant factors into projections of future
traffic growth. Then the traffic growth factors are reviewed further to incorporate
prospective changes by other factors that are not accommodated in the econometric
analysis.
The predicted traffic growth will influence target safety levels through the increase in
the number of flight hours forecast. However, there are other factors, not necessarily
included in this forecast of traffic growth, that have the potential to influence the level of
safety. Some of these factors are: the growth in the total number of aircraft flying as
well as in the passenger capacity of aircraft (e.g. Airbus 380, Airbus 350, Boeing 7E7
Dreamliner), increased airport and airspace congestion, technological development
(e.g. advanced safety nets, satellite-based CNS/ATM), and pressure on finding the
tools to control and mitigate human error. Another important factor not considered is
57
Chapter 3
Preliminary Assessment
A fixed annual traffic growth rate until the year 2020 (i.e. 4 percent for western
built jets); and
A constant number of fatal accidents per year (i.e. eight fatal accidents each
year).
Based on these assumptions, the UK CAA predicted a rate of 1.8E-07 fatal accidents
per flight for the year 2020. For the purpose of the methodology presented in this
Chapter, this target has been translated into the rate per flight hour using the
information available on the Boeing web site (Boeing, 2004) as follows. The average
flight in 1982 was approximately 1.4 hours, while in 2002 it was 1.94 hours. If this trend
continues, it is determined in this research that the average flight in 2020 will be 2.43
hours. Using this assumption, the UK CAAs TLS for the year 2020 corresponds to
7.4E-08 fatal accidents per flight hour.
3.4.3.1.3 International Civil Aviation Organisation
There have been several attempts by ICAO to derive aviation target levels of safety.
These originate from a number of different studies and reports, which are presented
below, from the earliest to the most recent.
58
Chapter 3
Preliminary Assessment
ICAO North Atlantic Systems Planning Group (NATSPG) - the ICAO NATSPG
initially developed a method using the data on fatal accidents of jet aircraft in
the period from 1959 to 1966 (EUROCONTROL, 2000a). Based on available
data3 this analysis estimated fatal accident rate of 2.34E-06. The analysis
progressed by assigning a factor 0.1 for accidents due to collision. The basis for
this assumption is not evident or recorded. An improvement factor between two
and five was further applied to justify the use of historical data on future targets
(EUROCONTROL, 2000a). This resulted in a TLS ranging between 12E-08 to
4.6E-08 fatal accident per flight hour due to collision. Finally, the analysis
apportioned the value of TLS to three flight dimensions and thus calculated a
TLS for collision due to loss of lateral separation to be between 4E-08 and
1.5E-08 fatal accidents per flight hour.
be
used
for
establishing
any
vertical
minimum
performance
specification. This value is equal to or better than 5E-09 fatal accidents per
flight hour arising from collisions due to any cause for the period 2000 to 2010.
This value of a TLS is also indicated in the ICAO Annex 11 (ICAO, 2001c);
ICAO Annex 11 - in the situation where fatal accidents per flight hour is
considered to be an appropriate metric, ICAO Annex 11 (ICAO, 2001c)
proposes a TLS of 5E-09 fatal accidents per flight hour per dimension after the
year 2000. Although ICAO Annex 11 does not provide any justification for this
TLS, it is assumed that this value is taken from the ICAO RGCSP. For the
period prior to the year 2000, ICAO Annex 11 recommends the use of a TLS of
2E-08 fatal accidents per flight hour per dimension; and
ICAO All-Weather Operations Panel (AWOP) - the objective of the ICAO AWOP
was to assess the required navigational performance (RNP) for approach,
landing, and departure phases of flight (ICAO, 1994). Based upon historical
Based on 36 fatal accidents and an estimate of 15.5 million flight hours during the period
1959-1966.
4
The USSR developed a series of targets for progressive implementation, such as 1E-08 from
1990 to 2000, 5E-09 for 2000-2010, and 2E-09 for 2010 onwards (ICAO, 1995).
59
Chapter 3
Preliminary Assessment
data5, ICAOs calculation determined the average hull loss to be 1.87E-06 per
flight or 1.27E-06 per flight hour. Based on this historical data, ICAO proposed a
TLS for hull loss per flight hour to be 1E-07. The rationale for this risk
improvement over the historical accident rate is the removal of pilot errors by
the use of glass cockpit aircraft and tunnel incident alarm. The glass cockpit is a
system of electronic displays presenting all information on an aircraft's situation,
position, and progress. The tunnel incident alarm is an alert that is triggered if
the aircraft unintentionally leaves the assigned flight path, the tunnel, during
the approach and landing phases of flight. Additionally, the objective in aviation
safety is to reduce the number of accidents despite increasing flight hours. This
is essential if public confidence in aviation is to be maintained as the global air
transport system expands.
Data set covers hull loss accidents for the period from 1959 to 1990 for commercial jet aircraft
whose weight exceeds 60,000lbs. Exposure percentages are based on an average flight
duration of 1.47h. A hull loss accident is defined as an accident where the primary cause is hull
loss or aircraft damage beyond economical repair.
60
Chapter 3
Preliminary Assessment
Targeted year for the TLS calculation: current vs. future levels.
Type of
operation/
weight/type
of accident
Target
year
TLS
Worldwide
1960s
Serious
accident
Not
specified
1E-06 per
flight hour
WAAS
Worldwide
1990-1999
2020
1.8E-07 per
flight/7.4E-08
per flight hour
Not specified
Worldwide
Jets/19591966
Not
specified
2.34E-06 per
flight
Not specified
Not
specified
Jets/fatal
accidents
2010
1E-07 per
flight hour
En route fatal
accidents
After the
year
2000
5E-09 per
flight hour per
dimension
(1.5E-08 per
flight hour)
Jets/MTOW>
60,000lb/
hull loss
accidents
Not
specified
1E-07 per
flight hour
Reference
Title
Database
Joint
Aviation
Authorities
JAR 25.1309
Large
Aeroplanes Advisory
Material - AMJ
Not specified
UK Civil
Aviation
Authority
Aviation Safety
Review
CAP 701
North Atlantic
Systems
Planning Group
(NATSPG)
Review of the
General
Concept of
Separation
Panel (RGCSP)
ICAO
ICAO
ICAO
Annex 11
Not specified
Worldwide
All-Weather
Operations
Worldwide
ICAO
Not specified
Panel (AWOP)
1959-1990
th
15 meeting
Key: MTOW = maximum take-off weight of the aircraft
After the review of the most relevant analysis and methods of TLS calculation, the TLS
of 1E-08 accidents per flight hour is used as the baseline for the year 2020 (target year
of the research presented in this thesis). The reasons for using this baseline are:
The rate of 1E-07 is currently used as a target by ICAO for both fatal accidents
and hull loss accidents (see Table 3-2);
With the overall aim of reducing the accident rate given the current safety
targets, it is reasonable to aim at 1E-08 accidents per flight hour in the year
2020;
The analysis conducted by the UK CAA to predict the role of fatal accidents for
2020 (i.e. 7.4E-08 fatal accidents per flight hour).
61
Chapter 3
Preliminary Assessment
Once the TLS for the year 2020 is determined, the next step is to apportion the
contribution of ATC in the overall air transport TLS. To establish this, several studies
have been reviewed. The key findings are presented in the following section.
3.4.4 Target level of safety and Air Traffic Control risk budgeting
The next step is to determine the risk budget allocation for the ATC system as a
component of the overall air transport system, i.e. determine the contribution of ATC.
According to the results of the UK CAAs analysis, the contribution of ATC and ground
aids to aircraft accidents is 1.7 percent (Table 13 in EUROCONTROL, 2005).
EUROCONTROL currently uses 2 percent as a maximum direct contribution of ATM to
aircraft accidents within the European Civil Aviation Conference (ECAC) region. This
figure was derived based upon historical data (ICAO ADREP database focused on the
ECAC region) from which a contribution of ATC is determined to be 1.1 percent
(EUROCONTROL, 2001a). Recognising that only ATC causes were accounted for
(without contribution of other ATM components, such as ATS, ASM, AFTM)
EUROCONTROL allowed additional 0.9 percent, resulting in 2 percent of ATM
contribution to aircraft accident. This figure has been further validates via discussions
with
EUROCONTROL
Safety
Regulatory
Commissions
task
force
Hazard
62
Chapter 3
Preliminary Assessment
ICAO and did not use this distinction. The NLR study considered an occurrence as a
causal factor only if that occurrence was part of the chain of events leading to the
accident. The NLR approach seems to reflect better the aim of determining the overall
ATC contribution to aircraft accidents.
The results presented above need to be augmented for possible statistical error and
uncertainties linked to the reporting processes as well as to provide additional
protection for the future. As previously discussed, EUROCONTROL allowed additional
0.9 percent for statistical error and uncertainties in the calculation of the ATM safety
targets for ECAC region based upon historical data for only one component of ATM,
namely ATC (EUROCONTROL, 2001a). With this in mind, together with the results
from UK CAA and NLR studies, this thesis uses a maximum contribution of ATC of 3
percent. Thus, using the previously established TLS for air transport system for the
year 2020 (in the previous section), apportioned contribution of ATC is considered to
be 3E-10 per flight hour. Now, after deriving the TLS for ATC specifically, this functional
block should be divided between human operators, equipment, and procedures. This
approach now gives the opportunity to define the appropriate risk induced by failure of
ATC equipment which is presented in the next section.
3.4.5 Target level of safety and Air Traffic Control equipment risk
budgeting
It is important to determine the contribution of equipment (or their failure or malfunction)
to the ATC risk budget. The historical data on the proportion of incidents in which
equipment failure is implicated varies to a certain degree. Interviews with system
control and monitoring staff at two European ATC Centres6, as well as the
approximation used by the CORA 2 documentation (EUROCONTROL, 2004c) reveal
that equipment failures are the causal factor in 0.01 or one percent of all incidents.
Although this assumption is based on the ATM system and not its ATC component
only, it is used with other sources of information to inform the ATC equipment risk
budgeting within overall air transport system.
More focused approach is provided by the NLR study (van Es, 2003). This study
determined that the particular causal factor ATC ground aid malfunction or unavailable
has been attributed to 5 percent of all ATM related accidents or 18 percent of all ATC
related accidents. It should be noted that this causal factor includes unavailable ATC
Based upon private communications with staff at two European Area Control Centres (ACCs).
63
Chapter 3
Preliminary Assessment
equipment meaning equipment that was taken out of service by ATC staff, presumably
for maintenance reasons. In addition, the research was based on data samples that
incorporated older systems with lower levels of automation. Future systems are shifting
more towards a higher level of automation and higher reliability, as discussed in the
previous Chapter.
Therefore, it can be approximated that equipment failures represent the causal factor in
10 percent of all ATC related accidents (or 3 percent in all ATM related accidents). This
is based on the assumption that unscheduled failures constitute about 50 percent of
the failures in the NLR analysis discussed above. This approach derives a risk of an
ATC equipment failure leading to the aircraft accident to be 3E-11 per flight hour. The
reasoning presented seems to correlate with the widespread argument that human
error represents the causal factor in 70-80 percent of all accidents (Reason, 1997).
Although there is some evidence that the majority of these human errors represent
organisational errors (Johnson and Holloway, 2004). A graphical representation of the
determined risk budgets is given in Figure 3-5.
After assessing the contribution of ATC equipment failures to the overall risk of aircraft
accident, it is important to validate these findings with some operational experience.
This is achieved in the following section by analysis of operational failure reports from
three countries.
64
Chapter 3
Preliminary Assessment
65
Chapter 3
Preliminary Assessment
year and per source. The incident reports used in this section were from three sources,
namely three Civil Aviation Authorities (CAAs), presented as Country A (for the period
1999 to 2003), Country B (for the period 2001 to 2005), and Country C (for the period
1992 to 2004). The final results of this preliminarily analysis of available operational
reports are presented in Table 3-3. The average number of failures is calculated for all
three data sets (column 4). This is followed by the calculation of incident rates based
on the average flight hours flown for the given time periods (column 5). The final step
involved adjustment of the calculated incident rate to give the probability of accident
caused by equipment failure (using the accident to incident rate of 1 in 10,000) as
shown in the last column on Table 3-3. In other words this calculation produced the
operational level of safety for three countries and three respective time periods.
Country
Year
(1)
(2)
1999
2000
2001
2002
2003
2001
2002
2003
2004
2005
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Total number of
equipment
failures
reported
(3)
100
107
122
287
175
184
237
171
247
485
28
38
41
21
16
42
40
25
38
27
46
42
44
Average
number of
equipment
failures per year
(4)
(5)
(6)
158.2
1.15E-04
1.15E-08
264.8
2.58E-04
2.58E-08
34.46
8.85E-05
8.85E-09
Based on the contribution of equipment failures to the overall safety of air transport
system extracted from the past research and overall TLS methodology (3E-09 per flight
66
Chapter 3
Preliminary Assessment
hour), we can conclude that the TLS levels acquired from operational reports (last
column in Table 3-3) show a degree of conformity.
Even higher levels of conformity would be achieved with setting of higher level of TLS
for year 2000 (data indicate 1E-05 as opposed to 1E-06 accepted within aviation
community). Furthermore, better tuning of the current and future trade-offs within the
air transport system (see Chapter 2, Figures 2-1 and 2-3) would additionally enhance
the proposed methodology for determination of risk budgeting of the ATC equipment.
Future advancements in technology, changes in the levels of traffic, and overall
changes in the ATC/ATM philosophy (e.g. shifting of separation responsibility from the
ground to the air) have a potential to improve safety. At the same time it is reasonable
to assume that the distribution of the levels of risk within the air transport system will
change. The results specific to ATC given here could be used as an input to a
complete safety analysis that should consider trade-offs between the various
components of the aviation system to realise risk budgets for a safe and cost effective
system. Finally, the severity of the reported incidents could be used to inform the
weighting scheme and to better reflect the accident to incident ratio, as the above
analysis considered all incidents equally.
In short, the above analysis indicates that the available operational failure reports are a
representative sample of equipment failures occurring in ATC Centres worldwide.
Having established the appropriateness of this sample, the following Chapter moves
toward the identification of operational characteristics of equipment failures extracted
from past research and operational failure reports.
3.6 Summary
This Chapter starts with a precise definition of equipment failures and hazards,
representing a sub-group of equipment failures that require human intervention (or
human recovery). It continues by presenting a sample of operational failure reports
available in this research. After discussion on the reporting schemes designed to
capture incident occurrences, including equipment failures, the Chapter continues by
highlighting data pre-processing problems and solutions applied to overcome them. In
order to assure the relevance of equipment failures captured in the sample available,
the remainder of the Chapter builds a framework for its validation. This framework for
risk assessment, based entirely on past literature, begins from the risk assessment of
the overall air transport system and focuses on one component, namely ATC
67
Chapter 3
Preliminary Assessment
equipment. In other words, this section determines the maximum allowed accident risk
imposed by ATC equipment failures for the target year 2020.
The contribution of equipment failures to the overall safety of air transport system
extracted from past literature have then been compared with the result obtained from
the analysis of available sample. This analysis showed a degree of agreement between
the theoretically assumed and operationally extracted levels of ATC equipment risk
budgeting. In other words, the available operational failure reports are a representative
sample of equipment failures occurring in operational ATC environment. Hence, the
next Chapter proceeds with a detailed assessment of the equipment failure
characteristics extracted from operational failure reports and available literature.
68
Chapter 4
The previous Chapter showed that operational failure reports available in this thesis
constitute a representative sample of equipment failures occurring in the operational Air
Traffic Control (ATC) environment. This Chapter moves toward the identification of the
operational characteristics of equipment failures. These are extracted from past
research and more than 20,000 operational failure reports. Special attention is paid to
the impact that equipment failures may have on ATC operations, and as a result a
severity rating scheme has been designed to support the research presented in this
thesis. Having discussed the consequences of equipment failures and their impact on
ATC operations, it is important to discuss how such consequences can be prevented or
mitigated. This involves the process of recovery from equipment failure and a
distinction can be made between technical and human recovery. This Chapter
discusses technical recovery by reviewing the existing technical built-in defences,
whilst the next Chapter discusses human (i.e. controller) recovery. A subset of
equipment failure characteristics relevant to ATC operations is then used in this
Chapter to develop a novel tool for the assessment of the severity of equipment
failures, known as the qualitative equipment failure impact assessment tool. This tool
enables an assessment of the overall impact of an equipment failure on ATC
operations.
69
Chapter 4
Table 4-1 Examples of equipment failures related to different ATC system functionalities (as
defined in Chapter 2)
Type of failure
Communication
function
Navigation function
Surveillance function
Example
Total radio telephony failure on three frequencies (three sectors).
Workstation had to be reset to default fallback setting.
Runway 15 Instrument Landing System (ILS) failed whilst aircraft on 16
NM final approach in Instrument Meteorological Conditions (IMC).
Approach Control Centre was advised and aircraft confirmed the failure.
Aircraft was preparing for a missed approach, when the ILS returned to
service after recovery.
Erroneous altitude readings displayed on radar for B777 and B767 at
FL340 and FL350, respectively. Short term conflict alert (STCA) was
activated.
Data processing
function
Triple failure on suite flight data exchange. System fully recovered after 40
min by manual intervention. Departures from two airports were stopped for
approximately 10min. The cause was the existence of duplicate flight
identity numbers within the flight data held in the affected workstations.
Supporting function
B737 was on the final approach at 50ft over the runway when the controller
received a false Approach Monitoring Aid (AMA) warning. The controller
was concerned that in low visibility conditions a go-around would have
been unnecessarily given.
STCA failed to activate against two aircraft at FL120. One aircraft was
dropping parachutes, with the other filming them. Consequently, the
aircraft were quite close to each other. They were both squawking
Secondary Surveillance Radar (SSR) codes, but Short term Conflict Alert
70
Chapter 4
Power supply
At time 0535 power failure in the tower caused Radar Data Processing
System (RDPS) and Flight Data Processing System (FDPS), radar, public
telephone network, weather radar, and computer failure. At time 0650
position rebooted and upgraded. ATC service returned to normal at 0730.
Cursor frozen in global ops field of electronic flight strip. The controller was
moved to an adjacent console and resumed operations from that position.
There was only a brief interruption to the service.
System monitoring
and control function
71
Chapter 4
and
consequently
appropriate
preventive
strategies
should
be
72
Chapter 4
73
Chapter 4
Another important factor is the overall ATC Centre architecture, since exposure to
failure varies greatly based on the interconnectivity of different equipment, the level of
separate channels (redundancy/variability), and failure complexity (single failure vs.
multiple failures). Based on operational experience (NATS, 2002) and ATC operations
room configuration, four categories can be differentiated. These categories range from
the impact on the entire operations room, several sectors, or only one sector. The
categories are defined as follows:
All workstations/all sectors affected;
A number of workstations/different sectors affected;
Several workstations (within same suite)/one sector affected; and
One workstation/one sector affected.
The proposed categorisation by NATS follows the severity of the impact of failures on
the operations room starting with the most severe failure (known as outage) to the least
severe type of failure (affecting only one workstation). In addition, each suite is
responsible for a specific portion of airspace (i.e. sector) whilst each sector has a
declared capacity (expressed in terms of the number of aircraft in the sector in the peak
hour). As a result, the failure characteristic impact on operations room is linked with
the number of aircraft exposed to the impact of equipment failure.
74
Chapter 4
absence of recovery procedures, and a lack of experience may create the potential for
controller error.
From a financial perspective, equipment failures create planned and unplanned costs
of repair, training (of both controllers and technicians), and incident investigation.
However, the most likely costs are measured in terms of additional costs placed on
airlines in the case of significant delays (e.g. loss of connecting flights and passenger
accommodation). These are discussed further in the next section.
Ideally the combination of all three consequences of an equipment failure should
constitute the overall impact on ATC operations or the particular failures severity.
However, in the operational environment the most usual practice is to combine safety
and the operational impact of an equipment failure to determine its severity rating. The
following paragraphs review severity ratings defined specifically for equipment failure
occurrences. They originate from safety regulations defined in two Air Navigation
Service Providers (ANSPs) and one Civil Aviation Authority (CAA).
The UK National Air Traffic Service (NATS) recognises four categories of failure types
based on their impact on ATC operations, namely major impact, impact on workstation
or suite, ATC impact, and minimal impact (Table 4-2). Furthermore, analysis of
operational failure reports in this thesis identified the severity categorisation from one
CAA (referred to as Country C) and another ANSP (referred to as Country D). The CAA
of Country C defines the severity rating of equipment failures according to the potential
to cause a significant problem (see Table 4-3).
Definition
Severe flow restrictions could be required
May be necessary to combine/move positions immediately or sector flow
restrictions may be required
Not immediately critical, will have greater operational impact over time
Centre management required
75
Chapter 4
Factor
Definition
CR
Critical
MA
Major
MI
Minor
Finally, the data for Country D originate from one particular ATC Centre. This Centre
determines the severity of an incident as a result of the combination of the impact it has
on both the controllers (internally in this ATC Centre as well as externally in other ATC
units) and system control and monitoring engineers. In general, in this particular ATC
Centre the determination of the severity of an incident is the task of the system control
and monitoring unit which distinguishes five severity classes. These are presented in
the Table 4-4.
Table 4-4 Country D severity rating as defined by the particular ATC Centre
Severity
Factor
Definition
System down
Critical
Urgent
Important
Enhancement
These severity rating schemes indicate that each country follows its own severity index.
Furthermore, there is a difference in severity ratings between ANSPs and CAAs, as
ANSPs are concerned about the impact on their service provision business (e.g.
delays), whilst safety regulators are concerned about whether such an event causes an
accident. Therefore, simply comparing the severity of occurrences between countries is
unlikely to produce useful findings. All classifications are rather qualitative and depend
76
Chapter 4
Table 4-5 Severity rating defined in this research and mapped with available sources
Severity
rating in
this
research
Major
77
Mapping with
severity ratings
from available
research
Major
(UK NATS)
Chapter 4
Minimal
Major
(Country C)
1
(Country D)
Impact on
workstation/suite
(UK NATS)
Major
(Country C)
2 and 3
(Country D)
4 and 5
(Country D)
Having defined the three-level severity rating to be used in this research, appropriate
mapping is established with the existing severity ratings (as defined by UK NATS, the
CAA of Country C, and the ANSP of Country D). The comparison of specific categories
from each of the available sources reveals the matching with major, moderate, and
minimal ratings as defined in this research (Table 4-5). Note however that the major
category, as defined by Country C, had to be split between major and moderate
categories, as defined in this research. The rationale behind this split is based on two
78
Chapter 4
criteria of equal importance. The first criterion is the definition of major and moderate
categories as presented in Table 4-5. In other words, the severity rating has to
distinguish between failures that affect the entire ATC Centre and those that affect only
workstations reliant on the failed item. The second criterion is based on the impact of a
failure on ATC operations. For example, loss of a VOR or NDB is rated as moderate
because navigation may be still provided using radar surveillance, other navigational
aids (Global Positioning System-GPS, Automatic Dependence Surveillance-ADS).
However, loss of an ILS during the approach phase or in reduced visibility conditions is
rated as major. During this phase of flight the aircraft is in the landing configuration
(i.e. reduced speed, in close proximity to the ground). If visual contact with ground is
not achieved at the moment of the failure, an immediate go-around procedure is
necessary. Because of this, the failure of an approach navigation aid (such as ILS) is
considered more severe.
79
Chapter 4
Another example of the severe impact that one single failure can induce is the outage
that occurred in the Chicago ATC Centre in 1995 when the en-route automation
component failed for two hours. This single occurrence cost the airlines an estimated
$12 million in delays (National Transportation Library, 1997). The National
Transportation Library (NTL) report mentions this example to make a case for the
replacement of the outdated main and back up Flight Data Processing Systems
(FDPS), involved in the reported incident. In short, these examples show how severe
the impact of an equipment failure on global ATM operations can be. This issue will
become especially important in a future gate-to-gate ATM system where the roles for
planning and control will have to be re-organised and distributed between controllers
and pilots.
Similar to ATC operations, the impact of failure on ATM can be analysed from several
different perspectives. From operational and safety perspectives, a higher degree of
workload will be experienced both on the ground by controllers, technicians, and
engineers and in the air by flight crew. From a financial perspective, in addition to costs
identified in ATC, it is necessary to add the cost of delays in a wider region. A small
exercise has been conducted on the cost of delays induced by ATC equipment failures
to indicate the financial impact of delays in the European Civil Aviation Conference
(ECAC) and US airspace. This is presented in Appendix I.
Having discussed the consequences of equipment failures, it is important to discuss
how such consequences could be prevented or mitigated. This involves the process of
recovery from equipment failure and a distinction can be made between technical and
human recovery. The following section focuses on technical recovery and the principles
used to prevent and in some cases to mitigate the impact of equipment failures. The
human recovery aspects are addressed in Chapter 5 and throughout the rest of the
thesis.
80
Chapter 4
grouped under the term technical built-in defences. They represent defences against
any unplanned or unwanted interruption of service. They are complex socio-technical
systems which combine technical, human, and organisational measures that prevent or
protect against an adverse effect (Smith et al., 2004). Verification of the existence and
appropriateness of existing defences provides confidence in the safety of a system and
is a requirement for system certification.
Safety is recognised as the ultimate imperative in ATC and therefore, should be
addressed as early as possible in the design process. Having sound safety principles
built into each phase of the design (i.e. conceptual, preliminary, and detailed design
phase) is a useful way to avoid, prevent, and mitigate failures and their impact. Safety
through design is planned through five different principles (Figure 4-1) for hazard1
avoidance, elimination, or control, which are as follows (Christensen and Manuele,
1999; National Aeronautics and Space Administration, 2002; The European New
Machinery Directives cited in Piantek, 1999):
Eliminate hazards;
Design for minimum risk;
Incorporate safety devices (i.e. devices designed to prevent any unwanted event);
Provide warning devices (i.e. alert that signals the occurrence of some unwanted
event); and
Develop operating procedures and training schemes.
Figure 4-1 Safety through design (adapted from Christensen and Manuele, 1999)
Within system safety, a hazard is usually defined as a condition which can lead to an accident.
In this research, a hazard is defined as the ATC system state resulting from an equipment
failure that penetrates all existing technical defences and affects the ability of the controller to
perform his/her tasks.
81
Chapter 4
The suggested principles follow the logical order of precedence. The first two
approaches focus on the elimination of the hazard from the system. However, if the
identified hazards cannot be eliminated (due to difficulties or cost), risk should be
reduced by using fixed, automatic, or other protective safety devices (i.e. defences for
seamless recovery from failure). When neither design nor safety devices can effectively
eliminate identified risks or adequately reduce them, devices should be used that
detect the unwanted condition and produce adequate warning signals to alert the
controller (i.e. defences for transmitting information regarding a failure). These warning
signals should be designed to minimise the probability of inappropriate human reaction
and response. Note that regardless of how a warning device performs (Figure 4-2), the
triggering failure represents a hazard (according to the definition in this thesis) as it
affects controller performance.
As explained before, the human operator remains the last line of defence (i.e. human
recovery). For this reason, when warning devices are not sufficient, special procedures
and training scheme should be designed. These must be periodically tested, verified,
and regularly updated to assure their effectiveness.
Similarly, when dealing with equipment failures in ATC, it is important to distinguish
between technical and human (i.e. controller) recovery (Figure 4-2). Both processes
start with the detection of failure (either by a technical system or controller) and
conclude with an outcome. The outcome can be nominal (pre-failure), non-nominal but
stable (i.e. degraded), or inadequate system state (leading to incident or accident). The
outcome of the equipment failure and recovery process is discussed in detail in the
following Chapter. The following paragraphs focus on technical recovery, while human
recovery is addressed in subsequent Chapters.
82
Chapter 4
failure (warning devices). Both categories are examined further in the following
sections.
83
Chapter 4
alert or a warning should enhance the probability of appropriate human reaction and
response (i.e. controller recovery performance). According to the FAAs Human Factors
Design Standard (Federal Aviation Administration, 2003) warning devices should:
Alert the operator to the fact that a problem exists;
Inform the operator of the nature of the problem;
Guide the operators initial responses (based on priority); and
Confirm in a timely manner whether the operators response corrected the problem.
Alerts are usually generated immediately after the system detects any discrepancy
from predefined system performance. There are several ways in which ATC controllers
are informed of equipment failures or non-availability of certain functions. The most
usual ones are through colour-coding (e.g. change in the workstations border colour)
and textual messages, all presented on the Human Machine Interface (HMI). In
addition to the content and location of the alert message, it is equally important to
display an alert in a timely manner. Alert onset is defined as time between a systems
detection of a failure and the moment an alert is presented on the HMI either by colour
change or text message (i.e. time-to-alert or TTA). This timing is usually system-driven
(based on the system threshold) but there are novel initiatives toward human-driven or
cognitively-driven alert onset. In general there are three different types of alert onset:
Immediate onset (an alert is presented on the HMI after the system detects the
failure with the least time delay). This is the normal case for severe events.
Delayed onset (an alert is presented on the HMI with a time-based or thresholdbased onset). For example, system requirements could be set up to inject an
alert with a specific time delay following the occurrence of a failure or to inject an
alert once a system-defined threshold has been reached (i.e. TTA). In the nuclear
industry this is known as alert sequencing or alert hierarchies indicating the
urgency of actions needed. In this way, a hierarchy makes use of safety criticality,
injecting firstly safety-relevant alerts followed by operational alerts. In satellite
navigation, the TTA value is one of the measures of the integrity of a satellite
navigation system (Feng et al., 2005).
Cognitively convenient onset (an alert is presented on the HMI based on
cognitive convenience which can be defined thorough the levels of controller
workload). This futuristic concept is mostly used in the nuclear and automobile
industry where cognitive convenience is determined by measuring workload
using physiological measures (e.g. heart rate, breathing rate, galvanic skin
response, eye tracking device). This concept has been tested on a US naval ship
as described in Daniels, Regli, and Franke (2002). This study proposes a method
84
Chapter 4
Hours flown data are collected for commercial airlines, including domestic, regional, and
international air traffic for each country.
85
Chapter 4
geographical coverage of the datasets available and the availability of ATC systems
and equipment (e.g. number of radars, navaids, communication systems).
The information on flight hours for each country has been extracted from the CAA
websites, annual incident summaries, and personal correspondence with the staff from
the engineering unit. After establishing the common ground with an appropriate unit of
measurement, further analyses are performed with available data structured around
four equipment failure characteristics, as they were possible to extract consistently
from available datasets. These four equipment failure characteristics are: type of ATC
functionality and equipment affected, complexity, severity, and duration3 of equipment
failures. The type of equipment/ATC functionality affected and complexity of failure type
are extracted from the short summary available for each report. The severity of
equipment failure is extracted using the available severity rating (if it existed) or
assessing the available information of the operational and safety impact of equipment
failure and thus applying the severity rating derived in this research (see Table 4-5).
The duration variable was available only in the Country D database. Finally, additional
statistical tests have been performed to identify any relationship between four
equipment failure characteristics. The structure of the data analyses is presented in
Figure 4-3.
The nature of the variables under consideration determined which statistical methods
could be used to analyse the data. As can be seen from their description in this
Chapter, most variables are categorical (type of equipment/ATC functionality affected,
complexity of failure type, and severity). Additionally, complexity of failure type and
severity variable have an ordinal character (assuming the ranking between possible
categories). Only duration represents a continuous or ratio scale variable4. This
variable is firstly investigated for its overall distribution, further to be split into categories
to extract information regarding failures of short duration (discussed in sections 4.1.4
and 4.4.6).
86
Chapter 4
Operational
failure reports
4 Countries
22,808 available reports
Data preprocessing
Available data
Reference
Traffic figures from
respective CAAs
ATC functional
classification
Chapter 2
Chapter 4, section
4.1.2
Rate of
equipment
failures
Country A, B, C, and D
Country A, B, C, and D
Complexity of
failure type
Country A, B, and C
Severity rating
Chapter 4, Table 4-5
Severity
Country A, B, C, and D
Country D database
Duration
Country D
Additional
statistical tests
Using the SPSS statistical package, frequencies of related categories are identified and
the most frequent categories are reported for each variable. To establish relationships
between these variables, additional statistical tests are also performed. In this regard,
chi-square tests are used to test the relationships between two categorical variables.
The most important assumptions of the chi-squared statistical tests are random sample
data, a large sample size, adequate cell sizes (no less than 5 observations per cell),
independent observations, and normal distribution of deviations between observed and
expected values. The size and characteristics of the available datasets imply the
conformance with all listed assumptions. Furthermore, the Cramers V test is used to
measure the association for nominal data (i.e. ATC functionality variable) whilst the
Kendall tau test is used for ordinal data (i.e. severity and duration variables). These
tests are briefly discussed in the following paragraphs.
87
Chapter 4
Cramers V is the chi-square-based test that measures the strength of the relationship
between nominal variables and is applicable across contingency tables of size greater
than 2X2 (Berenson et al., 2006). Cramers V coefficient is interpreted as a measure of
the relative strength of an association between two variables and it ranges from 0 to 1
(i.e. 1 representing a strong association). Suppose that the null hypothesis is that two
variables are independent random variables. Based on the frequency table and the null
hypothesis, the chi-squared statistic X2 can be computed as the squared difference
between the observed (O) and expected frequency (E) in each cell, divided by the
expected frequency. Then, Cramers V coefficient is defined in equation 4-1 below:
V =
X
=
nm
(O E ) 2
E
nm
4-1
where n represents a sample size while m represents a smaller value between number
of rows minimised by one and number of columns minimised by one.
Kendalls tau is a chi-square-based test that measures the strength of the relationship
between ordinal variables applicable across contingency tables of all sizes (Berenson
et al., 2006). Kendalls tau coefficient has the following properties:
If the agreement between the two rankings is perfect (i.e. the two rankings are the
same) the coefficient takes the value of 1.
If the disagreement between the two rankings is perfect (i.e., one ranking is the
reverse of the other) the coefficient takes the value of -1.
For all other associations the value lies between -1 and 1, and increasing values
imply increasing agreement between the rankings. If the rankings are completely
independent, the coefficient takes the value of 0.
Kendall tau coefficient is defined in equation 4-2 below:
2P
1
n(n 1)
2
1=
4P
1
n(n 1)
4-2
where n represents the number of pairs, P represents the number of concordant pairs.
In statistics, a concordant pair is a pair of a two-variable observation dataset {X1,Y1}
and {X2,Y2}, where (equation 4-3):
sgn( X 2 X 1 ) = sgn(Y2 Y1 )
4-3
88
Chapter 4
sgn( X 2 X 1 ) = sgn(Y2 Y1 )
4-4
1 , x < 0
sgn x = 0 , x = 0
1 , x>0
4-5
Therefore, a high value of P indicates that most pairs are concordant, i.e. the rankings
are consistent. A tied pair (sgn x = 0) is not regarded as concordant or discordant. If
there is a large number of ties, the total number of pairs (in the denominator of the
equation 4-2) should be adjusted accordingly (Berenson et al., 2006).
After presenting the overall methodology used for data analyses, the following sections
present some of the key findings and results.
89
50
45
40
35
30
25
20
15
10
5
0
Country A
Country B
Country C
19 9
2
19 9
3
19 9
4
19 9
5
19 9
6
19 9
7
19 9
8
19 9
9
20 0
0
20 0
1
20 0
2
20 0
3
20 0
4
20 0
5
Chapter 4
Year
Figure 4-4 Total number of equipment failures per flight hours flown in each year for countries
A, B, and C
The data available on the rate of equipment failures for Country D reveals a sharp rise
in number of equipment failures from 30 failures per 10,000 flight hours captured in the
last half of the year 2000 to 45 failures per 10,000 flight hours in 2001 (Figure 4-5)5.
The reason for this is that only five months of data was available for the year 2000.
Therefore, we can conclude that a rate of reported equipment failures in this ATC
50
45
40
35
30
25
20
15
10
5
0
Country D
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
Year
Figure 4-5 Total number of equipment failures per flight hours flown in each year for country D
(year 2000 incomplete)
Although the rates of equipment failure of Country D are tenfold higher compared to Countries
A, B, and C, Country D data are retained for subsequent analyses as they represent the most
detailed and reliable source of operational failure reports.
90
Chapter 4
The next section builds on this trend analysis and assesses affected ATC
functionalities. The classification of all ATC functionalities, as defined in Chapter 2, has
been used for this purpose and the findings are presented for each Country separately.
Further analysis of sub-functions and equipment most affected by failures identified the
following five types: air ground communication, secondary surveillance radar (SSR),
flight data processing system (FDPS), primary surveillance radar (PSR), and other
communication systems, ranging from pagers, headsets, microphones, cables, to
footswitches (Table 4-6).
Percentage
33.1
17.7
10.1
5.2
4
Similar to the previous case, two ATC functionalities for Country B most affected by
equipment failures are the communication and surveillance functions (Figure 4-7).
91
Chapter 4
Table 4-7 presents five types of equipment most affected by failures. These are: PSR,
air situational display or radar display, air ground communication, voice switching
communication system (VSCS), data exchange network, and runway/taxiway lighting.
Percentage
17.2
15.1
11.6
8.8
7.6
7.6
Country C shows a slightly different trend in the distribution of equipment failures per
ATC functionality. The two most affected categories are the navigation and
communication functions (Figure 4-8).
92
Chapter 4
Furthermore, the five most affected equipment types are: air ground communication,
instrument landing system (ILS), very high frequency omnidirectional radio range
(VOR), non-directional beacon (NDB), and air situational display (Table 4-8).
Percentage
23.7
19.6
7.6
6.5
5.8
Country D shows a similar trend to Countries A and B, as two most affected ATC
functionalities are communication and surveillance (Figure 4-9). Although the
navigation function seems not to be represented at all in Figure 4-9, there were only
two failures affecting this functionality and both are due to testing of Global Positioning
System (GPS) clock alarms. The reason for the under representation of this ATC
functionality is the fact that data originated from one particular ATC Centre that
provides area control service and as such is not responsible for the ground-based
navigational aids and airport-based equipment (e.g. meteorological equipment,
m
on
i to
rin
g
Sy
st
em
y
su
pp
l
er
Po
w
Po
in
ti n
g/
inp
ut
ne
ts
Sa
fe
ty
Su
pp
or
tin
g
pr
oc
es
si
ng
ei
lla
nc
e
Su
rv
D
at
a
Na
vi
ga
ti o
n
3500
3000
2500
2000
1500
1000
500
0
om
m
un
ic
at
io
n
Frequency
ATC functionality
Further analysis of data for Country D shows that the following five equipment types
are most affected by equipment failures: air situational display (radar display), data
exchange network, air ground communication, other surveillance systems (mostly
referrers to radar links), and other communication systems, such as pagers, headsets,
microphones, cables, and footswitches (Table 4-9).
93
Chapter 4
Percentage
21.9
15.7
11.6
8.7
4
Table 4-10 collates the five ATC equipment types most affected by failures, from each
available dataset. Findings are structured according to the ATC functionality they
support (in rows) and sources (in columns). Overall it can be concluded that Countries
A, B, and D are quite similar in relation to the most affected ATC functionalities. Results
of data analyses from these three countries indicate that failures mostly affect the
communication and surveillance functionalities. On the other hand, results of data
analysis from Country C differ as failures mostly affect the navigation functionality.
These are mostly failures of ILS, followed by failures of VOR, NDB, DME, as well as
airport lighting facilities (runway and taxiway lighting). Furthermore, the only equipment
type frequently affected by failures in all four countries is air-ground communication.
Other equipment types common in available datasets are air situational display, radar,
data exchange network, and supporting communication system (e.g. pagers, headsets,
microphones, cables, and footswitches).
Table 4-10 Summary of the five ATC equipment types most affected by failures
ATC
functionalities
Communication
Country A
Country B
Country C
Country D
A/G
communication
other
communication
systems
A/G
communication
A/G
communication
A/G
communication
other
communication
systems
data exchange
network
other
surveillance
systems
air situational
display
VSCS
data exchange
network
PSR
PSR
SSR
air situational
display
air situational
display
runway/taxiway
lighting
ILS
Surveillance
Data
processing and
distribution
Navigation
FDPS
VOR
NDB
94
Chapter 4
Table 4-11 Percentage of the multiple failure occurrences reported in the available datasets
A
B
Number of reports
with multiple failure
occurrences
42
206
24
448
N/A
N/A
Aggregated
data
272 (8.4%)
3219
Country
Total number of
reports
Comment
1378
1393
95
Chapter 4
Using the severity categorisation defined in section 4.2.3, it is possible to categorise all
available equipment failure reports from operational and safety perspectives. The
following section assesses the ATC functionalities affected by equipment failure with
respect to their severity or impact on ATC operations.
The major category accounts for 7 percent, 14.4 percent, 12.7 percent and 6.5
percent of the equipment failures within Countries A, B, C, and D respectively. These
results show the importance of assessing the degree of severity for each of the
equipment failure occurrences. For example, the majority of failures reported in the
96
Chapter 4
Country D dataset tend to have minimal impact on ATC operations and controller
performance (Figure 4-13). However, if we observe only major equipment failures, or
failures that affect an entire ATC Centre or a major part of it, it is notable that the most
affected ATC functionalities are: communication accounting for 45.3 percent of all
aggregated equipment failure reports, surveillance accounting for 29 percent, followed
by data processing and distribution accounting for 15 percent (Figure 4-11).
Country
Country A
Country B
Country C
Country D
250
Frequency
200
150
100
50
0
Comm
Nav
Surv
Data proc
Power
ATC functionalities
Further, the major failures of the communication functionality are mostly due to the loss
of air ground communication or available frequencies and problems with data
exchange network (when used as a coordination channel). This is determined by
observing the frequency of equipment types that support the communication
functionality affected by a major failure. Using a similar approach, the frequency of
equipment types that support the surveillance functionality affected by a major failure is
determined. These are: air situational display and radar. Within the data processing
and distribution function, more than half of the major failures are due to one particular
piece of equipment, namely the Flight Data Processing System (FDPS). This particular
system handles flight plans, making them live through automatic events, manual
inputs, and transitions from one state to the other. This information is provided via the
air situational display or radar display (Table 4-12).
97
Chapter 4
Table 4-12 Summary of the five most affected equipment types from four datasets
ATC functionalities
Communication
Surveillance
Data processing and
distribution
Major failures
air ground communication
data exchange network
air situational display
primary and secondary surveillance radar =
loss of radar coverage
flight data processing system (FDPS)
98
Chapter 4
category. The final duration category, substantial period of time, is further divided into
two additional sub-categories, failures that last up to one day and those that last longer
than a day. This is done to extract more information as about 40 percent of the
equipment failures belong to the substantial period of time category. The results of the
analysis suggest that eight percent of reported equipment failures in Country D lasted
more than one day. Further investigation of equipment types affected by failures lasting
more than one day revealed that the majority of these are data exchange network
problems, air situational display, flight data processing system, links with radar sites,
and air ground communication.
3,000
2,500
Frequency
2,000
1,500
34.51%
31.6%
25.85%
1,000
500
8.04%
0
[0.00-0.25]
[0.26-1]
[1.01-24]
[>24.01]
Figure 4-12 Distribution of the failure duration according to four distinct categories
Since this research addresses controller recovery from ATC equipment failures, the
focus is on major failures within the short period of time category. Table 4-13
presents the distribution of the major failures lasting up to 15 minutes, according to the
ATC equipment affected. It can be seen that the equipment most affected is the data
exchange network, followed by the other surveillance systems (mostly refers to radar
link), flight data processing system, air situational display, and air ground
communication.
Table 4-13 Distribution of major failures lasting up to 15 minutes per ATC equipment affected
ATC equipment affected
data exchange network
other surveillance systems
flight data processing system
99
Percentage
28
16
13.7
Chapter 4
12
7.4
Variable 1
Variable 2
Test
Country A
Country B
p<0.001
ATC
functionality
Severity
Non-parametric test
(Cramer's V)
Country C
p<0.001
p<0.001
ATC
functionality
Country D
Statistical significance at 95
percent confidence level
Severity
ATC
functionality
as above
p<0.001
as above
p<0.001
Non-parametric test
(Kendalls tau)
p=0.021
Duration
Severity
All statistical tests revealed significant relationships. For all available datasets there is a
significant relationship between the type of ATC functionality affected and the
equipment failure severity rating. The main findings from these tests indicate the
dominance of equipment failures affecting the communication and surveillance
functionalities with both minimal and major impact (see Table 4-15). The last test,
namely the relationship between failure severity and duration for Country Ds dataset
indicates significant negative relationship. In other words, the data indicates that the
longer the failure, the less severe it tends to be. This finding is expected as more
severe failures tend to be attended to immediately and thus the time between the first
log and closure of these failures may be shorter.
100
Chapter 4
Table 4-15 Main findings regarding interactions between ATC functionality and severity
Severity rating
Country
Country A
Country B
Country C
Country D
Major
surveillance
communication
communication and surveillance
Minimal
communication
communication and navigation
navigation
communication and surveillance
After qualitative and quantitative assessment of the equipment failures in ATC, the next
section derives a framework of the equipment failure impact assessment tool. This tool
is designed to assess equipment failures and provide an indication of their severity or
overall impact on ATC operations.
Table 4-16 Review of equipment failure characteristics with regard to their impact on ATC
operations
Equipment failure characteristics
Impact on ATC
operations
Comment
To be considered
To be considered
To be considered
Duration of failure
To be considered
Output
Output
The inclusion of all failure characteristics in this tool except ATC functionality affected
is relatively straightforward. When including the characteristic time course of failure
development, out of three possible categories (i.e. sudden, gradual, and latent) the
category latent was omitted. The reason for this lies in the fact that latent failures tend
101
Chapter 4
to be overlooked in the overall ATC system for long periods of time until triggered by
some other failure. As such, they have a profound effect on the controller, but only
once they are triggered by other failure.
The ATC functionality affected represents the key failure characteristics in terms of
effect on controller performance. It is significantly different if the controller is left to
operate without some key functionality (e.g. radar picture, communication, power
supply) as opposed to some auxiliary tools or equipment (e.g. monitoring tool, headset,
mouse). Therefore, it is necessary to separate ATC functionalities according to their
importance for the radar control of air traffic in a dedicated airspace. The separation is
intended to simply differentiate between primary and secondary ATC functionalities.
Their precise definitions informed by various examples are given in the following
paragraphs and Table 4-17.
Primary ATC functionalities are considered primary tools for achieving safe and
efficient flow of air traffic in any dedicated airspace. This group consists of the key
components, equipment, or tools of the communication, navigation, surveillance, data
processing, and power supply functionalities. These ATC functionalities are
categorised as primary ATC functions because they provide the critical information to
the controller. This critical information consists of: voice (and data) communication with
the aircraft in a dedicated airspace, aircraft horizontal and vertical position relative to
other traffic, and navigational directions or vectors to comply with the requirements of
the flight plan. These data are presented to the controller via an operational display
used for tracking the progress of multiple aircraft at any given moment. In modern ATC
Centres,
the
communication
function
is
provided
via
the
Voice
Switching
Communication System (VSCS) touch panel (see Chapter 2 for more details). In
addition, it is necessary to highlight that the power functionality also represents a
primary function. This is a direct consequence of the computer driven ATC environment
where electrical power supplies all of the above mentioned systems. Therefore, in case
of any disruption (either from public utilities or an ATC Centre's own installation), the
controller may lose some or all primary functionalities. Table 4-17 captures the primary
ATC functionalities.
Secondary ATC functionalities (Table 4-17) represent supporting tools to achieve the
primary objective of the ATC service. Their function is important but not irreplaceable
by other, primary ATC functionalities. This group consists of: input/pointing devices,
system monitoring, safety nets, supporting ATC tools, as well as various components
102
Chapter 4
Table 4-17 Detailed overview of the primary and the secondary group of ATC functionalities
ATC
functionality
group
Communication
Navigation
Primary
Sub-functionalities
(equipment, sub-systems, tools)
ATC functionality
Surveillance
Data processing
Power supply
Communication
Secondary
Navigation
Air-ground
Ground-ground
Voice Switching Communication System
Instrument Landing System (ILS) (during approach
phase and in the case of reduced visibility)
Primary Surveillance Radar
Secondary Surveillance Radar
Parallel Approach Runway Monitor
Terminal Approach Radar
Precision Approach Radar
Air Situational Display
Flight Data Processing System
Radar Data Processing System
Main power system
Uninterruptible power supply(generator, battery)
Data exchange network
Back-up system
Aeronautical Information Service
Other
Navigational aids (e.g. Very high frequency
Omnidirectional Range - VOR, Distance Measuring
Equipment - DME)
Airport facilities control and monitor (navigation aids
monitoring, aeronautical ground lighting)
103
Chapter 4
Surveillance
Data processing
Supporting function
(ATC tools)
Safety nets
Monitoring aids
Sequencing manager
Other
Short Term Conflict Alert
Minimum Safe Altitude Warning
Area Proximity Warning
Runway Incursion Monitoring and Conflict Alert
System
Pointing devices
Input devices
System monitoring
104
Chapter 4
The output of this tool is an assessment of the overall impact of an equipment failure
on ATC operations and consequently controller performance. The rationale behind the
severity ratings presented in Figure 4-13 is as follows:
Loss of primary functionality tends to have moderate to major severity, depending
on other equipment failure characteristics (e.g. complexity of failure type) and
relevant contextual conditions (e.g. traffic). Moderate to major severity rating is due
to the fact that the primary ATC functionalities represent the critical tools for
achieving a safe and efficient flow of air traffic in any airspace.
Loss of secondary functions tends to have minor to moderate severity, depending
on the additional variables such as complexity of failure type, time course of failure
development, and duration. Minor to moderate severity rating is due to the fact that
the secondary ATC functionalities only provide assistance for more efficient air
traffic control, but do not represent the systems without which the control of the air
traffic flow becomes unfeasible.
Multiple failure occurrences may have a more severe impact on ATC operations
than a single failure occurrence simply because controllers have to cope with more
than one failure simultaneously.
Gradual failures (e.g. gradual loss of data integrity) may have a more severe impact
on ATC operations than sudden failures (e.g. sudden loss of data).
Duration of failure and severity rating tends to be inversely proportional. Data
analysis indicates that the longer the failure duration, the less severe it tends to
affect ATC operations and controller performance. The rational behind is that more
105
Chapter 4
106
Chapter 4
most severe failure types, independent of each other. Future research should look into
the enhancement of this tool to enable the assessment of the impact of several
independent failures on controller performance. The output of this more advanced
approach would be to indicate the most severe independent multiple failure
combinations. However, to achieve this, the tool would have to be designed for a
specific ATC Centre to integrate the complexity of its ATC architecture and flow of data
between the various components of the ATC system.
4.6 Summary
In line with the objective of the research presented in this thesis, this Chapter has
identified potential equipment failure types and their key characteristics. Special
attention has been paid to the consequences of equipment failures and their impact on
ATC operations. A severity rating has been defined and applied to available operational
failure reports. The Chapter has further discussed technical recovery designed to
prevent or mitigate the impact of equipment failures on ATC operations and controller
performance.
Stepping away from theoretical findings from past literature, this Chapter has provided
operational input through the analyses of operational failure reports from four countries.
These analyses focused on four variables: the type of ATC functionality and equipment
affected by the failure, complexity of failure type, severity of its impact, and the overall
duration of the failure. Using the available reports it has been possible to identify
distributions of equipment failures in relation to these four variables. Although these
countries are different in terms of the volume and characteristics of airspace they
control, traffic levels, and equipment types; the analyses has shown that
communication and surveillance functionalities are affected most by equipment failures.
When observing only major failures, the most affected are the communication,
surveillance, data processing functionalities, and power supply. Further investigation of
major failures lasting a short period of time has revealed the most affected ATC
equipment. These are the data exchange network (as part of the communication
functionality), the flight data processing system (as part of the data processing
functionality), and air situational display (as part of the surveillance functionality).
The Chapter has concluded with development of a framework for the assessment of
the impact that every single equipment failure has on ATC operations. In general, the
knowledge acquired from equipment failure literature, informed by the analyses of
operational failure reports has been incorporated into the qualitative equipment failure
107
Chapter 4
impact assessment tool and its severity output. These will inform the choice of
equipment failure and its characteristics for the experiment designed to assess
controller recovery.
The safety-critical industry is aware of the fact that hazardous equipment failures
cannot be avoided and that absolute safety is not achievable. Thus, the same attention
given to their analysis should be given to the overall human recovery process. Kanse
(2004) points out that what we really want to prevent is not so much the failures
themselves, but the negative consequences of these failures. As a result, the following
Chapter gives appropriate attention to the controller recovery process.
108
Chapter 5
The previous Chapter explained the characteristics of equipment failures and the
notion of technical recovery. This Chapter reviews the associated issues of the process
of controller recovery. In Air Traffic Control (ATC), the human recovery process
involves two groups of individuals. One group consists of controllers and the other
consists of system control and monitoring engineers1. The Chapter starts with a brief
discussion of the roles controllers and engineers have in the recovery process. As the
focus of this thesis is on controller recovery from equipment failures, the Chapter
continues with a review of past research of relevance to this subject. In this respect, the
Chapter reviews in detail the phases of controller recovery and the corresponding
models developed for the Air Traffic Management (ATM) and non-ATM industries. This
is followed by a discussion of the major factors that influence the quality of controller
recovery. The Chapter concludes by proposing a set of variables used for a detailed
assessment of controller recovery performance later in this thesis. This set of recovery
variables is also used as a guide to the design of the experiment to capture real data
on controller recovery in Chapter 9.
109
Chapter 5
110
Chapter 5
that supports controllers. They reconfigure and maintain degraded or failed equipment
with minimum disruption to controller tasks and regularly upgrade the software as
operational requirements deem necessary. System control personnel have rapid and
reliable communication links with the ATC operations room via the supervisor. They
utilise this communication channel to inform ATC staff of the status and performance of
equipment and systems or to receive reports of technical problems and equipment
failures from the operations room. Therefore, EUROCONTROL (2004e) concludes that
recovering the ATC system from failure is a result of close coordination and
cooperation between controllers, technicians, and management.
Following this brief discussion of the roles and responsibilities of controllers and
engineers in the recovery process, the next section reviews the past research on the
human recovery process and its phases, developed for the Air Traffic Management
(ATM) and non-ATM industries. The main findings are then applied to a particular
process of controller recovery.
111
Chapter 5
Context of research
Frese (1991)
Software design
Kontogiannis (1999)
Human Machine
Interface
Human Machine
Interface
Human Machine
Interface
Sellen (1994)
Assessment of
everyday slips and
mistakes
Nuclear industry
Kanse (2004)
Chemical industry
Nuclear industry
Bove (2002)
ATM industry
Error detection
Error explanation
Error handling
Error detection
Error explanation or localisation
Error correction
Error occurrence
Error diagnosis (detection +
explanation)
Error recovery (planning + execution)
Mismatch emergence
Detection
Recovery
Error detection
Error identification
Error recovery
Detection
Localisation
Correction
Detection
Explanation
Countermeasures
Detection
Explanation
Correction
Detection
Correction
Therefore, in the research on recovery from equipment failures presented in this thesis,
past research is used to inform the phases of the controller recovery process.
Bove (2002) does not identify the diagnosis phase in the human error management process.
This may be due to the fact that this phase represents a covert human activity, difficult to
observe, measure, and capture in incident reports.
112
Chapter 5
Detection of equipment failure is taken as the first phase, triggered by the mismatch
between ATC system feedback and active knowledge of the controller (expectation or
assumption). This phase is followed by the diagnosis and correction, leading toward
the outcome of the recovery process (as a result of both technical and controller
recovery).
Controller recovery is defined in this thesis as the ability of the controller to detect3,
diagnose, and correct any non-nominal system state resulting from ATC equipment
failure (adapted from van der Schaaf, 1995). The objective of the recovery process (i.e.
its outcome) is to restore the system to its nominal (pre-failure) state or at least to limit
the consequences of failure in the most efficient and effective way (by achieving stable
non-nominal system state). The following sections discuss the phases of controller
recovery.
5.2.1 Detection
Human recovery is a sequential process whose first step is the detection of failure.
Without this detection there is no recovery process. Therefore, the first task of the
controller is to detect the failure. As previously explained, failures can be firstly
detected either by a technical system or by a controller. Hallbert and Meyer (1995) note
that to accomplish detection by the human operator, the stimulus must be
recognisable. In other words, the stimulus must be something that a controller has
already experienced, is trained to observe, or is of sufficient intensity to interrupt the
monitoring process (e.g. visual or auditory alert positioned within the field of view but
different from the background noise already present on the radar screen or other
operational support system).
Thus, detection is triggered by any mismatch between the expected effects and
observed outcomes. The mismatch can be explained on the basis of the information
that is matched against the frame of reference or range of the expected system
responses. For example, after issuing an instruction for a flight level change to an
aircraft, the controller expects to see the old flight level gradually changing toward the
new one. However, if the controller observes a flight level change outside the expected
113
Chapter 5
values, then this expectation will trigger the identification of some sort of fault. This
fault can be caused by an erroneous flight level change by the pilot or system readout
of the aircraft altitude (e.g. due to radar garbling).
In the case of a total failure of a particular function, it is easier to detect and diagnose
the significance of the change, since the failure is obvious. However, in the case of a
partial failure of a particular ATC function (e.g. corruption of tracks and squawks),
detection may be more challenging. In these circumstances, detection is based on the
controllers memory of aircrafts past positions and future trajectories, aided by
available tools (e.g. flight strips). An example of potential difficulties encountered by
controllers in detecting partial equipment failure is reported by Sampaio and Guerra
(2004). In this example, a sudden failure of the Radar Data Processing System (RDPS)
affected only one radar track and went unnoticed by the controller for 21 minutes (see
Chapter 4, section 4.2.1).
Detection is also closely connected to the time course of equipment failure
development, namely sudden, gradual, or latent failures (see Chapter 4, section 4.1.3).
Sudden failures do not allow any time to prepare, but are usually detected immediately.
On the other hand, detection of gradual failures may be extremely difficult and delayed.
Persistent (latent) failures are almost impossible to detect. They might exist in the ATC
system for a long period of time before they are detected. This is confirmed by
interviews conducted during this research with the aim of augmenting the theoretical
sources of information. Engineers from three European ATC Centres confirmed that
latent failures (mostly software failures) tend to go unnoticed until some other event or
failure reveals their existence (for evidence see Appendix II).
There are various other factors that can hinder failure detection, such as difficulties in
observing system feedback or remembering expectations about effects. Detection can
also be made difficult by inappropriate system design (e.g. poor human machine
interface, poor quality or position of alert), workplace layout, or controller working
strategy. As an example, an alert that is barely visible or audible may remain
undetected even by a highly alert controller.
Often, successful detection occurs as a consequence of a combination of design
qualities and mental resources. An example is taken from one of the European ATC
Centres where the label of the ATC function positioned in the general information
window changes its colour from white to yellow in the case of a failure. However, in the
114
Chapter 5
training facility of the same ATC Centre, within the same window, one specific label is
designed to be colour-coded yellow regardless of its status (i.e. label Lines refers to
the status of the communication lines between a number of ATC Centres). Such a
training platform design feature has the potential to result in the missed detection of a
failure by a controller as a result of a continuous and consistent presence of the yellow
colour in the general information window.
Besides the quality of an alert, its onset also plays an important role. As previously
discussed in Chapter 4, alert onset (i.e. Time-To-Alert or TTA) is defined as time
between a systems detection of a failure and the moment an alert is presented on the
Human Machine Interface (HMI) either by colour change or text message. More
importantly, the future concept of cognitively convenient alarm onset aims to
circumvent these human limitations by providing an alert, for the system-detected
failure occurrence, at the moment when levels of controller workload allow its detection
(see Chapter 4, section 4.3.2).
The above discussions have highlighted that detection can be either enhanced or
hindered by a combination of technical and human related factors. External stimulus,
past experience, appropriate design solutions, and sudden development of equipment
failures tend to enhance detection. However, inappropriate system design, high levels
of workload and fatigue may hinder failure detection. Similar conclusions are drawn
from the study on human recovery performance in nuclear power plants by Kaarstad
and Ludvigsen (2002). Based on a literature review, an experimental investigation, and
field studies, they identify the three most significant factors that affect the detection
phase. These are:
communication - interaction with colleagues can provide information to detect a
failure;
system feedback - cues directly found in the operational environment (e.g. alerts,
other non-usual system event); and
internal
feedback
mismatch
between
operators
expectations
of
115
Chapter 5
determine the significance of that mismatch. Generally, the existing system output is
compared with the previously observed one, to determine whether the change is within
tolerance. For example, if an aircraft is in level flight no flight level change should occur
and any deviation from the cleared flight level should trigger the detection of an
unusual event (e.g. pilot error, radar garbling).
The detection phase is investigated further using data from a questionnaire survey and
an experiment in Chapters 6 and 10 respectively.
5.2.2 Diagnosis
Once detection occurs, the diagnosis phase (also known as explanation, localisation,
or identification phase) determines what the failure is, its cause, and what should be
done to correct it. A controller needs a good knowledge of a failure to determine what is
occurring and its effects (e.g. what to expect in the near future, whether the function is
still partially available or totally lost, any problem with data integrity and possible impact
on other tools). This is especially important in the ATC environment where the overall
system consists of highly integrated components and different failures may present
themselves to the controller in a similar manner. For example, a radio frequency failure
manifests itself in the same manner regardless of its cause (i.e. ground- vs. airbornebased failure). Therefore, it is up to the controller to identify the true failure by ruling out
alternatives. In this particular example, the controller will first try to establish radio
contact with other aircraft. If communication is established with the other aircraft it is
reasonable to assume that the failure is on the aircraft side. The controller will then try
to identify if it is a receiver or a transmitter failure by asking the aircraft to squawk
identification. If the aircraft squawks identification then the pilot clearly heard the
transmission. The controller then knows that the aircraft has experienced a transmitter
failure. By employing this procedure, the controller determines the precise element of
the equipment that failed, and thus implements the most appropriate recovery
procedure.
Past research in non-ATM industries has shown that in some cases, after the detection
of a failure, the corrective actions are immediately known and implemented. In these
cases, the diagnosis phase is omitted (e.g. in the nuclear industry - Kaarstad and
Ludvigsen, 2002). Similarly, the study from the chemical process industry has shown
that the order of the phases is not always the same. More precisely, the diagnosis
phase does not necessarily follow the detection phase, especially in time-critical
116
Chapter 5
operations. Often a quick fix might be necessary or an initial correction might occur
even before the cause of a failure has been identified (Kanse, 2004).
The findings from non-ATM industries are not entirely applicable to the ATC/ATM
environment. It is difficult to see how the diagnosis phase could be omitted simply
because proper ATC equipment failure recovery is not possible without knowing the
true nature of a failure. However, the duration and the attention dedicated to the
diagnosis phase relates directly to the level of workload experienced by the controller
at the moment of failure occurrence and during the recovery process. Through
interviews, EUROCONTROL study determined that controllers in most occasions do
not seek an explanation for a cause of failure (EUROCONTROL, 2004e). They focus
only on identifying the system that failed, which is essential to implement an adequate
recovery strategy. An example could be the code-callsign conversion failure, where,
having detected a problem, the controller has to identify the pair of aircraft affected.
This tends to be a very time-consuming process leaving no time for the controller to
consider the cause of the failure. Another example is corruption of radar data. If the
controller doubts the quality of a particular radar source in the multi-radar coverage
airspace, it is possible to use information from other radar sources. If the same failure
occurs in the single-radar coverage airspace, the controller has to disregard radar data,
initiate procedural (non-radar) control, and pass the problem to the system control and
monitoring unit. In both cases, the controller has to determine what failed and what the
impact of that failure is, in order to implement an adequate recovery strategy. The
cause of the failure is left to the system control and monitoring unit to investigate.
From the discussion above, it is clear that the diagnosis phase is important to identify
the equipment that has failed. However, if the failure is identified and corrective actions
are immediately known, diagnosis is omitted for the subsequent correction phase. The
diagnosis phase and the factors that may influence it are addressed further in Chapter
10 on an experimental investigation. Once the controller diagnoses the failure type and
its impact on the ATC system, the tasks shift to more action-based activities. In short,
the controller initiates the correction phase which is described below.
5.2.3 Correction
Failure recovery involves knowing how to undo or minimise the effect of failure and
achieve the desired system state (nominal or stable non-nominal system state,
respectively). The first priority is to minimise the effect on the air navigation service and
the exposure of the problem in terms of aircraft and time. Depending upon the
117
Chapter 5
equipment failure type, recovery should follow available procedures (for details see
section 5.5). Some of them could be fairly simple like switching to another radar source
in multi-radar processing areas, changing to the secondary radio frequency (if the
primary one is blocked), changing unserviceable input devices (mouse or keyboard),
and switching to another console (if the current one is not operational). Other recovery
strategies could be very complex and both physically and mentally demanding. For
example, if an automated conflict detection tool fails to work properly (e.g. Short-Term
Conflict Alert STCA and Medium Term Conflict Detection - MTCD), an alert might
appear when there is no failure, or conversely the controller might detect a conflict that
was not alerted automatically. In both instances, the controller will diagnose that the
conflict detection tool itself is not functioning properly. Immediate action would be
required to ensure the safety of all traffic. In other words, the controller will have to
detect all existing conflicts and resolve them in a timely and efficient manner without
the assistance of automated safety nets (e.g. STCA). The second priority would be to
test and restore the automated function, which would be the responsibility of the
system control and monitoring unit.
Past research in the nuclear industry has identified different types of decision events
that constitute the correction phase of recovery (Orsanu and Fischer, 1997; Kaarstad
and Ludvigsen, 2002). These are assessed for the ATC environment below:
ignoring the failure error/failure has been detected, but ignored by the operator for
two possible reasons: error/failure is considered irrelevant (i.e. no impact on
operations) or the operator assumes that his/her intervention may make the
situation worse. In any case the failure would have to be reported;
applying procedures this seems to be the most common correction type.
Therefore, it is necessary to ensure that procedures exist and that they are
appropriate to a particular failure;
choosing a solution in theory this is applicable when procedures are not available
and the human operator has to apply more conscious resources to comprehend the
situation. In many situations it may seem that only one solution is possible to
resolve the failure. However, in retrospect, more than one solution may be
available, while only one was considered at the time; and
creating a solution in this case the operator has no experience with the failure
type. No procedures, training, or past experience are available for the human
operator to draw upon. A completely new solution or strategy has to be created.
118
Chapter 5
There are two types of human competences: epistemic and heuristic. Epistemic competence
refers to domain knowledge about the system which one seeks to control. It is context
dependent component of the actual competence. Heuristic competence refers to a general
competence for handling complex dynamic tasks. It is context independent, but it is developed
over many years through both training and experience. As a result, actions and decisions
become fast, automatic, without apparent conscious awareness.
119
Chapter 5
outcome phase, the human operator attempts to resolve the problem, by implementing
a recovery strategy. This is followed in the outcome phase by post-correction
monitoring or post-recovery analysis to determine the actual outcome of the
implemented strategy. Therefore, the first task in this phase is the monitoring itself,
both by controllers and engineers. Proper design solutions could aid this phase by
providing post-recovery system status indicators.
EQUIPMENT
FAILURE
HAZARD
RECOVERY
OUTCOME
RECOVERY
SUCESSFUL
RECOVERY NOT
SUCCESSFUL
RECOVERY
CONTINUES
INCIDENT WITH
FURTHER
CONSEQUENCES
Figure 5-1 Analysis of the outcome phase (adapted from EUROCONTROL, 2004e)
It might be expected that at this stage human performance requirements are similar to
those of the detection phase. However, as observed by EUROCONTROL (2004e)
there is a crucial difference. Guided by implemented corrections (recovery strategies),
monitoring by both engineers and controllers is driven more by top-down processes,
primarily expectation. Since at this stage in the recovery process the operators have
knowledge of the failure and its cause, they also have expectations on how the system
might behave after a correction is implemented. For instance, if the system remains
unstable, operators may expect a reoccurrence of the same problem, other related
problems (common-mode or common-cause failures), or have a general suspicion that
the assessment of the problem was wrong or misleading.
Following the period of monitoring or active checks, the controller must decide whether
recovery is successful. Recovery is considered successful if the system returns to the
nominal (pre-failure) or intermediate, stable state (EUROCONTROL, 2004e).
Intermediate state represents a degraded operational state (e.g. loss of any function,
item of equipment, or a significant overload condition causing increased system
response time) which is detected and stabilised either by controllers or engineers. In
essence, the system is in the intermediate state if the consequences of failure are still
observable in the system performance while controllers are aware of the quality of
120
Chapter 5
information they are receiving from the system and thus the quality of service they can
provide to traffic.
If recovery is unsuccessful, the controller will return to either diagnosis (to determine
the real cause of the problem) or correction phase to retry the previous strategy or
attempt a new one (Kanse and van der Schaaf, 2000; EUROCONTROL, 2004e). This
cycle of reapplied efforts continues as long as there is the time available for recovery.
Otherwise, if no time is available, the final outcome may be an incident with further
consequences (e.g. loss of separation).
The next section reviews the existing models of failure and recovery process
developed to support the research on human recovery in ATM and non-ATM industries.
121
Chapter 5
122
Chapter 5
BEGIN
Problem situation
arises as a result of
one or more failures
D
Detection of
deviation
C
Countermeasures
END
Of recovery
process
123
Chapter 5
Figure 5-3 The Recovery from Automation Failure Tool Framework (EUROCONTROL, 2004e)
124
Chapter 5
Figure 5-4 Model of failure recovery in air traffic control. Where two nodes are connected by an
arrow, signs (+, -, 0) indicate the direction of effect on the variable depicted in the right node,
caused by an increase in the variable depicted in the left node (Wickens et al., 1998)
The model also reflects the hypothetical function which relates recovery response time
to the level of automation (Figure 5-4). It is expected that recovery response time will
increase as the level of automation increases (shown as a dashed upward line on the
right side of the Figure 5-4), due to increased complexity, skill degradation, and overall
out of the loop phenomenon. The solid downward line reflects the decrease of the
reaction time available to controllers as a result of the introduction of higher levels of
automation. Controllers will have far less time to safely respond to any loss of
separation and fewer opportunities for effective solutions. As a result, this model
represents the Bainbridges (1983) ironies of automation by overlaying two critical time
variables against each other and as a function of automation-related changes. These
variables are: the time required to establish safe separation, given a degraded ATC
service, and the time available to a controller (or a team) to react and safely recover
from a failure.
After describing the three models relevant to controller recovery from equipment failure
in ATC, Table 5-2 summarises their characteristics and identifies their limitations
addressed later in the thesis. In general, all three models are qualitative and based on
a principle of a sequence of phases that constitute the process of human recovery.
125
Chapter 5
They are based on past research, whilst only one model is based on operational data.
The limitations identified in the last column of Table 5-2, guided the research presented
in this thesis and the main principles behind the framework for the assessment of
controller recovery. In short, the research in this thesis is verified in the simulated
environment (experimental investigation Chapter 10), based on operational
experience (from interviews with relevant ATM staff, operational data Chapter 4, and
the questionnaire survey - Chapter 6), and based upon detailed assessment of the
recovery context (Chapters 7 and 8).
Context
Operational
input
Assessment
of recovery
Kanse
(2004)
Chemical
industry
Yes
(interviews
and data)
Qualitative
and
quantitative
No
SHAPEs
RAFT
tool
ATM
Yes
(interviews)
Qualitative
(expertbased)
Qualitative
(expert-based)
No
Qualitative and
potentially
quantitative
(based on the
recovery
reaction time)
Wickens
et al.
(1998)
ATM
No
Prediction of
recovery
Limitations
No assessments of
the recovery context
No prediction of the
recovery process
Not verified in
simulated/operational
environment
Based only on
interviews and no
operational reports
Theoretical approach
As stated previously, there are three major factors that influence the quality of
controller recovery, i.e. past experience, procedures, and training. Whilst procedures
and training are regulated within the aviation community, operational experience is
accumulated over time and controllers may or may not experience equipment failures
during their career. For this reason, the next sections describe and discuss existing
regulations regarding recovery procedures and training. Operational experience,
extracted from the questionnaire survey, is investigated in the following Chapter.
126
Chapter 5
127
Chapter 5
In their guidance for recovery from four failure types, ICAO recommends necessary
steps to be taken by controllers and pilots, as well as ATC Centre watch managers or
supervisors. When necessary, ICAO also recommends collaboration with adjacent ATC
units. Therefore, the recovery process is not seen only as the responsibility of
controllers but all parties involved within the affected ATC Centre and region (including
the adjacent ATC unit which can provide valuable assistance in restricting or rerouting
the flow of traffic). All other failure types are left to national service providers to include
and define in their Manuals of Air Traffic Services (MATS).
5.5.1.2 European and national regulation
At European level, EUROCONTROL published guidance and recommendations for
controller training in the handling of unusual/emergency situations, known as the
ASSIST scheme (EUROCONTROL, 2003f). This scheme covers all procedures for
aircraft emergencies but paradoxically does not cover any type of ATC equipment
failure. The ASSIST programme, captured in a publicly available document, is intended
to represent only a framework to be further customised and adapted to the specific
requirements of each ATC Centre utilising local expertise. Thus, each ATC Centre is
required to assemble a team of experts, implement the current ASSIST programme,
and discuss other safety-critical events (e.g. ATC equipment failures) to be included in
emergency procedures, training, and/or aide-memoire.
5.5.1.3 Air navigational service provider regulation
National air traffic service providers may publish their own procedures for
emergency/unusual situations in the MATS. The MATS contains procedures,
instructions, and information which form the basis of air traffic services within a country.
It is published for the guidance of civil air traffic controllers, but may also be of general
interest to other associated parties within civil aviation. For example, the UK MATS is
arranged in two parts. Part 1 is published by the UK CAA (as CAP 493; UK CAA, 2006)
and consists of instructions which apply to all UK ATC units. Part 2 is published by the
UK National Air Traffic Service Provider (NATS) and consists of instructions which
apply to a particular air traffic control unit (e.g. the London Area Control Centre).
NATS publishes specific recovery or fallback procedures in their internal MATS Part 2
document. This document defines 33 failure types and relevant strategies for their
recovery (NATS, 2002) and thus reflects the particular ATC system characteristics of
the UK ATC Centres. No information regarding the methodology to compile these
128
Chapter 5
recovery strategies is available. It can only be assumed that these recovery procedures
are a direct result of expert discussions, operational experience, and experience with
ATC system performance.
The manual advises that the planning controller should be the focal point in the sector
team during the duration of failure with the main objective to ensure that the
tactical/executive controller is supported at all times. The recovery procedure for each
of the 33 defined failures consists of the following:
a short description of the failure (i.e. what a controller should expect, what are the
potential effects on the ATC system);
a description of the system-generated alert (e.g. brown border, text message);
and
a list of required recovery steps (these steps are separately defined for planner,
tactical/executive, assistant controllers, and watch supervisor).
The New Zealand air navigation service provider (i.e. Airways New Zealand) publishes
MATS as required by the Civil Aviation Authority of New Zealand. This document
recommends the use of the recovery procedures for failures of significant components
(e.g. radar data processing, flight data processing, the overall communication system),
as these have the most severe effect of ATC operations. The recovery procedures are
published as a separate document designed to be readily available at each position
(Failure Modes Quick Reference Guide-FMQRG; Airways New Zealand, 2006a). The
main objective of this document is to provide ready and quick assistance to operational
staff for handling equipment failures (i.e. aide-memoire).
The German air traffic service provider (DFS) defines emergency checklists for various
aircraft-related as well as military-specific emergencies. This document created a basis
for the development of EUROCONTROLs guidance for controller training in the
handling of unusual/emergency situations and the ASSIST scheme (EUROCONTROL,
2003f).
However,
emergency checklists
developed
by DFS
(same
as
the
129
Chapter 5
130
Chapter 5
the characteristics of the ATC Centre that participated in the experimental investigation
(presented in Chapters 9 and 10).
Finally, assuming that recovery procedures are available, their contents must be
accurate and kept up to date (i.e. reflecting all modifications/updates in the ATC system
architecture). They must be realistic, comprehensive, clear and easy to use, easily
accessible, and linked to regular emergency training.
After discussion on the recovery procedures and their key principles in ATC, the
following section discusses training for handling ATC equipment failures in a similar
manner.
131
Chapter 5
132
Chapter 5
133
Chapter 5
Fourthly, in spite of the clear need for regular training, the lack of resources
(infrastructure and staff) makes it impossible to train controllers for all different types of
emergency/unusual situations and all equipment failure types. For this reason an
organised exchange of experience at the level of ATC Centres, countries, or regions
(e.g. ECAC states, EUROCONTROL member states) may provide valuable knowledge
and insight into various unusual situations and strategies to resolve them. As an
example, in 2003 an A300 was struck on the left wing by an air missile system resulting
in a complete loss of hydraulics and therefore loss of all flight controls. Reacting
rapidly, the captain recalled a television documentary he had seen about a DC-10
crash at Sioux City, Iowa, and the thrust change technique employed by the captain
and crew of the DC-10 to control their aircraft. Although the A300 crew had never
practiced this technique before, they quickly gained control despite the extreme stress
of the situation (IFALPA, 2005). This example shows the importance of exchanging
information on knowledge, performance, and strategy between human operators.
Similar experience could be achieved in the area of ATM by supporting workshops,
newsletters, and other forms of information exchange on best practices and handling of
unusual events.
Finally, the EUROCONTROL (2004e) report on managing technical failures in ATC
points out potential future problems identified through controller interviews. Firstly, it
suggests that the mental picture of the traffic situation will be more difficult to form in
the future ATC environment. Secondly, it suggests that in the future, the controller may
require more knowledge of the ATC system architecture when compared to today.
Finally, the report suggests that newly qualified controllers and fully established
controllers have different perceptions of one another: newly qualified controllers are
perceived by some fully established controllers to be more trusting of the reliability of
new equipment, having rarely experienced failures in the past, while established
controllers are perceived by some newly qualified controllers as less computer literate
and more suspicious of technology.
The previous sections of this Chapter revealed the complexity of controller recovery by
discussing its relevant phases, from failure detection to the outcome of the recovery
process. In addition, the past research identified factors that influence the quality of
controller recovery. The next section defines a set of variables that capture the
important characteristics of controller recovery. These are the context that surrounds
the controller recovery process, the recovery effectiveness, as well as the recovery
134
Chapter 5
duration. These variables guide the design of the experiment to capture real data on
controller recovery later on in the thesis.
135
Chapter 5
In
addition,
the
review
included
the
overview
of
the
136
Chapter 5
focuses on the controllers first action that is observed on the ATC system (e.g.
communication regarding identified failure, interaction with HMI).
Apart from the moment of actual detection, the recovery duration variable may also
lack some aspects of the diagnosis phase. In other words, the cognitive processes
behind understanding the new situation and prioritisation of the recovery tasks to be
performed may also occur covertly. For example, the real cause of the communication
failure is not immediately obvious as the controller needs to investigate if the failure
affects ground ATC equipment or airborne radio equipment. Both of these features of
controller recovery are considered in the design of the experimental investigation
presented in Chapter 9.
5.8 Summary
As pointed out at the beginning of this Chapter, a good understanding of recovery
requires a detailed assessment of the recovery process from both the technical and
human perspectives. Whilst the previous Chapter discussed the technical recovery, this
Chapter focuses on controller recovery. The Chapters starts by distinguishing the
objectives of two separate groups of operators involved in recovery from equipment
failures, namely controllers and engineers. While this thesis focuses solely on controller
recovery from equipment failures, the reviewed theoretical background to human
recovery is applied to the controller recovery by identifying its major phases. As a
result, the main phases of controller recovery together with the outcome of the overall
recovery process have been described. Finally, various models of human recovery,
developed for both ATM and non-ATM industries, have been discussed with emphasis
on three of the most relevant ones to controller recovery. These are: the model by
Kanse derived for recovery performance in the chemical process industry, the RAFT
tool derived specifically for the ATC operational environment, and the model by
Wickens generally focusing on the impact of different levels of automation on the
recovery process.
Apart from identifying the main phases of the controller recovery process, the review of
the theoretical background has also highlighted the factors that influence the quality of
controller recovery, namely past experience, recovery procedures and training. While
past experience is aggregated throughout the controllers operational experience, the
current status and quality of recovery procedures and training are regulated by
international and national aviation authorities. Thus, the Chapter reviews and discusses
the current status of regulation regarding recovery procedures and training, whilst the
137
Chapter 5
138
Chapter 6
Questionnaire Survey
Questionnaire Survey
Chapter 5 showed that limited research has been carried out globally on human
reliability in relation to controller recovery. Hence, this Chapter presents the details of a
questionnaire survey scheme with the aim of overcoming the lack of knowledge and
further support the research in this thesis. The specific objectives of the questionnaire
survey are to investigate controller experience with equipment failures and to identify
factors that affect their recovery, to extract more operational experience, to investigate
the status and quality of recovery procedures and training, and to contribute to the
wider human reliability research by assessing the specific controller recovery. The
Chapter starts with the definition of the target population and sampling. It proceeds by
discussing the survey methodology identified for the collection of questionnaire
responses, design of the questionnaire, and the refinements identified by a pilot survey.
This is followed by the description of the full survey scheme (Figure 6-1). The Chapter
concludes with the methodology for the questionnaire survey data analyses structured
in three segments. These are: assessment of the sample characteristics, high-level
frequency analyses, and in depth assessment of interactions between recovery factors.
139
Chapter 6
Questionnaire Survey
140
Chapter 6
Questionnaire Survey
the most severe failures they have experienced. Thirdly, the survey contributes to the
determination of the status and quality of recovery procedures and training in ATC
Centres (and thus augments the findings from Chapter 5). Finally, the survey is
designed to contribute to the wider human reliability research by assessing the specific
controller recovery performance.
Six key questions were formulated in order to achieve the four objectives. The
questions (below) address ATC equipment, controller recovery performance, and
status of recovery procedures and training:
How often do controllers experience equipment failures (Q1)?
What factors influence their recovery performance (Q2)?
What is the most unreliable ATC equipment (Q3)?
Is there any organised exchange of information on equipment failures and/or other
types of unusual/emergency situations (Q4)?
Do recovery procedures exist (Q5)?
What do controllers feel about the quality of training currently available for recovery
from equipment failures (Q6)?
Given the objectives of the questionnaire survey above, the next section defines the
target population and sample size.
6.2 Sampling
The population for this questionnaire survey should consist of controllers from various
ATC Centres worldwide. The population characteristics to be sampled in this survey
are ATC Centres with different levels of traffic and airspace complexity, and ATC
system automation, and controllers with a range of operational experience (i.e. years in
service, rating).
Using the United Nations (UN) statistics that there are 191 independent countries
worldwide (United Nations, 2006), it is possible to estimate the total number of ATC
Centres. However, data on the number of ATC Centres for each country were not
available to this research1. Therefore another approach based on the distribution of
global air traffic (Airbus, 2004) has been used. In other words, the ideal sample should
consist of regional distributions of sampled controllers that correspond to the air traffic
1
141
Chapter 6
Questionnaire Survey
25
20
2003
32 32
31
15
33
2023
26
25
10
5
3
4
0
Africa
Latin America
and Caribbean
Asia and
Pacific
Europe
North America
Middle East
Region
Figure 6-2 Distribution of world air traffic per region for the years 2003 and 2023 (adapted from
Airbus, 2004)
142
Chapter 6
Questionnaire Survey
size (in terms of number of controllers and ATC Centres sampled) would vary
according to the choice of data collection method and available resources.
143
Chapter 6
Questionnaire Survey
network and corresponding emails. However, accessing the email addresses of 2001000 controllers worldwide presents a significant obstacle to the distribution of
questionnaires.
Additional problems with the self-completion method are the number of responses and
the quality of survey sample obtained. The self-completion method depends entirely on
the intention and willingness of the controller to participate in the survey. Thus it is
harder to control the number of responses obtained. Apart from the high likelihood of
low response rate of a self-completion survey, another drawback is that the quality of
the answers cannot be controlled. Even in the case of straightforward questions,
respondents may misinterpret some of the questions or may need more information on
the subject under investigation. The presence of the researcher, while the respondent
is answering the questions, provides the advantage of ensuring that the respondent
understands what is required from the survey.
After careful consideration of both the advantages and disadvantages of the two survey
methods (face-to-face and self-completion), both were adopted in this thesis. This
decision was based on the need to exploit the strong points of both methods
particularly given the timing and response rate constraints. In order to maximise the
benefit of the combined approach, the design of the questionnaire must account for
their unique characteristics.
144
Chapter 6
Questionnaire Survey
the highest reliability and completeness of responses, it was decided to use one
questionnaire design in both survey methods. The aim was to design the questionnaire
survey to extract the maximum information whilst ensuring convenience for both faceto-face and self-completion respondents. This was achieved following several design
principles. Firstly, special attention was given to clarity of questions to avoid any
ambiguity in the self-completion survey. Secondly, emphasis was placed on closed
questions, where the respondents answers did not require the presence of the
researcher. Closed-ended questions can be answered finitely by one of the given
answers; the simplest form being the yes/no answer. In general, these questions are
restrictive and can be answered in a few words. Thirdly, all key terms were defined.
Finally, for open questions, a list of potential answers were provided to guide the
respondents (e.g. for the question on the most unreliable ATC equipment, a
comprehensive list of various ATC equipment were provided). Open-ended questions
allow respondents to answer in their own words providing a narrative. In general these
questions solicit additional information, as they require more than one or two word
responses. Furthermore, the questionnaire was designed in a way that ensured that
any inconsistencies in responses can be identified. This was achieved through the
careful choice of questions and by having multiple questions assessing a particular
issue (e.g. recovery procedures).
The questionnaire has been structured around the main objective of the research
presented in this thesis. In other words, all the questions have been designed to
support the research on controller recovery from equipment failures in Air Traffic
Control (ATC). Based on the type of information obtained, the questionnaire is
structured in four distinct groups totalling 29 questions. The first group consists of
general and specific questions. The former covers the overall operational experience,
ratings, and the country/ATC Centre where the respondent works. The latter inquire
specifically about experience with equipment failure, asking the respondent to list
several examples in a greater detail. This first group consists of five questions.
The second group of questions inquires about the factors that affect controller recovery
by asking the respondent to rate the importance of three factors. This is followed by the
question on the most unreliable ATC systems/components, as well as the
organisational issues relevant for recovery. In total, this second group consists of four
questions.
145
Chapter 6
Questionnaire Survey
The third group of questions focuses on the existence and quality of recovery
procedures at the ATC Centre where the respondent works. This group consists of 11
questions.
The fourth group of questions focuses on the existence and quality of training for
recovery at the ATC Centre where the respondent works. This group has nine
questions. The final question provides an opportunity to the respondent to add
comments and suggestions related to the entire questionnaire.
The following is a one-page example of the questionnaire which was used during the
survey (Figure 6-3). It is the second page of the questionnaire. A complete
questionnaire is included in Appendix IV, while an example of a response to the
questionnaire is provided in Appendix V.
EUROCONTROL
in-house
controllers,
two
ATM
specialists,
and three
146
Chapter 6
Questionnaire Survey
added value of the survey and how the results would be used. This information was
included in the introductory page of the questionnaire (i.e. the first page). Additionally,
the pilot survey revealed the need for some examples of ATC equipment/tools which
were added as a note after question 5. These changes were incorporated in the final
design of the questionnaire.
The following sections discuss how the survey methodology has been exploited to
achieve the target sample size.
responses
were
received
from
the
controllers
involved
in
the
were
received
from
controllers
on
various
courses
run
by
IANS provides regular courses to ATC staff from all EUROCONTROL Member States (i.e. 37
European countries).
147
Chapter 6
Questionnaire Survey
related
to
recovery
procedures
are
investigated
(e.g.
adequacy,
completeness, currency). The final judgement is based on all answers that were
provided in relation to recovery procedures (and not only the first one).
148
Chapter 6
Questionnaire Survey
Misinterpretation was also noted in the question on the number of equipment failures
experienced annually. In this particular case, the data collection reflected the overall
misinterpretation of the term equipment failure and the consequent variation in the
answers. While some controllers reported all equipment failures they experienced
within one year regardless of severity, others reported only major failures classified as
infrequent high severity occurrences.
The possibility of errors arising from pre-processing of the responses was mitigated by
extra care at the data input stage (i.e. double checking of each input). In the case of
multiple response questions or questions returning a range instead of a single value, a
consistent approach was taken. For example, in response to question 4 What is the
average number of ATC equipment failures during one year that you experience? the
respondents tended to provide either a single numerical value, range, or a textual
answer. In the case of range, the middle value was taken. This method has been
applied consistently with other questions, if necessary. Textual answers have been
transformed into numerical values (e.g. once in two years was considered as 0.5 per
year). However, sometimes these textual answers could not be transformed to
numerical values and thus the answer was omitted (e.g. question 5 segment on
frequency and duration of failure was answered minutes, very frequent, very often,
rarely, very rarely, or once in career).
The next section describes the methodology behind the analysis of questionnaire
survey results.
149
Chapter 6
Questionnaire Survey
designed to answer (see section 6.1) whilst the seventh sub-group presents other
findings captured in the survey (presented in Appendix VI). The final segment of the
questionnaire survey data analysis provides an in-depth investigation of the interaction
between recovery factors previously analysed. The following sections discuss the
results and findings generated using the process in Figure 6-4.
Questionnaire
survey data
Characteristics of
the sample
High-level
analyses
58 ATC Centres
134 controllers
Experience with
equipment
failures
Interaction
analyses
Factors that
influence recovery
performance
The most
unreliable ATC
systems
Organised exchange
of information on
equipment failures
Status of recovery
procedures
Status of training
for recovery
Other findings
reported in
Appendix VI
150
Chapter 6
Questionnaire Survey
choices made in the questionnaire by each respondent were recorded under each
corresponding serial number.
During the process of data pre-processing and analysis, all available responses were
taken into account. A special scoring technique was used for questions that required
the ranking of choices (question 6). In this particular case, the controllers were asked to
score their reliance upon written procedures, situation-specific problem solving, and
other factors during the recovery process. This approach is explained in detail in
section 6.7.3.2.
Germany
Spain
Norway
Italy
France
Sweden
Number of responses
per ATC Centre
7
4
5
1
3
8
8
1
2
1
1
1
1
1
5
5
1
1
1
2
1
2
1
1
1
1
3
ATC Centre
Shannon
Dublin
Cork
Kemi
Belgrade
Zurich
Geneva
Bristol
Maastricht
Nieuw Milligen
Amsterdam
Karlsruhe
Langen
Frankfurt
Seville
Olso
Kirkenes
Stavanger
Bodo
Rome
Bologna
Naples
Venice
Milan
Paris
Nice
Stockholm
151
Number of responses
per country
16
1
3
16
1
4
3
5
8
2
8
Chapter 6
Questionnaire Survey
Slovenia
Belgium
Macedonia
Croatia
Moldova
Iceland
Denmark
Portugal
South Africa
Tanzania
India
Singapore
Tahiti
Australia
Austria
Romania
Malta
Macau SAR
Kenya
New Zealand
China
Malaysia
Total
34
Malmo
Gothenburg
Ljubljana
Brussels
Skopje
Split
Zagreb
Pula
Zadar
Chisinau
Reykjavik
Copenhagen
Lisbon
FAJS
Dar el Salaam
Mumbai
Kolkata
Singapore
Papeete
Melbourne
Vienna
Bucharest
Malta
Loqa airport
Macau
Nairobi
Wellington
Auckland
Christchurch
Hong Kong
Subang
58
3
2
1
3
1
1
1
1
1
1
2
3
4
2
1
3
4
2
6
1
2
2
2
1
3
4
1
2
2
1
2
134
1
3
1
4
1
2
3
4
2
1
7
2
6
1
2
2
3
3
4
5
1
2
134
152
Chapter 6
Questionnaire Survey
80
70
Percentage
60
50
40
75
30
20
10
20
0
Africa
Europe
North America
Middle East
Region
However, looking back at the characteristics of the population surveyed, the sample
still manages to capture the diverse levels of traffic and airspace complexity, ATC
system automation, and controllers with a range of operational experience (i.e. years in
service, rating). For example, in the European region the responses from Paris,
Frankfurt, Amsterdam, Zurich, Geneva, and Maastricht represent the input from some
of the busiest European ATC Centres. Likewise from Asia, the responses from
Mumbai, Hong Kong, and Singapore represent some of the busiest ATC Centres on
the continent as well as those that have experienced considerable growth in recent
years. Finally, the sample also includes ATC Centres with technically advanced
systems, e.g. Malmo ACC in Sweden, Maastricht ACC in Netherlands, Shannon ATC
in Ireland, and the Oceanic Control Centre in Auckland, New Zealand.
Although only five percent of responses were received from the African continent, the
ATC Centres sampled were considered carefully. Johannesburg and Nairobi airports
represent the leading airports in Africa for both passengers and cargo (Air Transport
Action Group, 2005). Both regions are experiencing an increase in passenger
movement mostly as a result of growth in tourism. Failure of ATC equipment and the
recovery response of controllers are of considerable importance in such busy ATC
Centres, more so than in other ATC Centres in Africa with considerably less traffic.
Given the difficulties encountered in accessing ATC Centres and controllers worldwide
(e.g. security, logistics, related costs) and the characteristics of the population
surveyed, the obtained sample can be considered as representative of the population.
The next section assesses the adequacy of sampling achieved within each ATC
Centre.
153
Chapter 6
Questionnaire Survey
154
Chapter 6
Questionnaire Survey
Furthermore, Figure 6-7 presents the distribution of the ratings of the controllers who
participated in the survey. In general, most controllers have ACC ratings. As a result,
data analyses may be biased towards the experience within the ACC environment
which tends to be better staffed and with more access to advanced equipment/tools
(e.g. multiple radar sites feed the radar coverage instead of single radar site as in APP
and TWR control, and investment in the more automated systems).
35
30
31.34
26.12
Percentage
25
20
15.67
15
10.45
9.7
10
5
2.24
3.73
APP
TWR
0
ACC & APP & ACC & APP
TWR
ACC
Rating
155
Chapter 6
Questionnaire Survey
and concluding with other findings on controller recovery (captured in question 5).
Therefore, the relevant sub-groups are: experience with equipment failures in the ATC
Centre, factors that influence the recovery performance, the most unreliable ATC
systems/tools, organised exchange of information on equipment failures, status and
quality of recovery procedures, status and quality of training for recovery, and other
findings. Each of the sub-groups is discussed below.
6.7.3.1 Experience with equipment failures (Q1)
In the sample obtained, 94.8 percent of controllers did experience some kind of ATC
equipment failure in their career. Additionally, this group of controllers experienced on
average 17 equipment failures annually, ranging from less than 1 per year up to 600,
as reported by one ATC Centre. This dispersion of the results reflects the wide
variation in the interpretation of equipment failures. Some controllers interpreted the
question on equipment failures in terms of only major (more severe) failures. Their
answers ranged from less than one (e.g. once in two years, once in five years, once in
a career) to one failure annually (34.6 percent of responses). Other controllers reported
the total number of failures experienced annually regardless of their level of severity, as
their responses ranged from dozens to hundreds. In short, the vast majority of
controllers surveyed have experienced equipment failures.
6.7.3.2 Factors that influence controller recovery performance (Q2)
Controllers were asked to rate how much they relied upon written procedures,
situation-specific strategies (i.e. context), and other factors (e.g. past experience) in
handling equipment failures. The ratings ranged from one to five, where one stands for
very much, two for much, three for moderate, four for minimal and five for not at
all.
The results show that more than 45 percent of the controllers surveyed rely on written
procedures in the event of an equipment failure at the levels of either much or very
much (see Figure 6-8). These controllers have on average more than 13 years of
experience, they operate in ATC Centres with recovery procedures (96.4 percent of
controllers who rated written procedures much or very much) and recovery training
schemes (64.3 percent controllers who rated written procedures much or very much).
156
Chapter 6
Questionnaire Survey
50
Frequency
40
30
37.4%
20
23.58%
22.76%
10
13.01%
3.25%
0
Very much
Much
Moderately
Minimal
Not at all
Written procedures
Figure 6-8 Controllers reliance on written procedures throughout the recovery process
Frequency
40
30
35.65%
20
27.83%
24.35%
10
10.43%
1.74%
0
Very much
Much
Moderately
Minimal
Not at all
Figure 6-9 Controllers reliance on situation-specific problem solving throughout the recovery
process
157
Chapter 6
Questionnaire Survey
Finally, 64.08 percent of controllers rated other factors (e.g. past experience) at the
level of either much or very much (see Figure 6-10). Similar to the previous factors,
the operational experience of controllers who rated this factor highest is on average
more than 13 years, they operate in ATC Centres with recovery procedures (90.8
percent of controllers who rated other factors much or very much) and recovery
training schemes (58.5 percent of controllers who rated other factors much or very
much). European controllers rely most on other factors (e.g. past experience) when
recovering from equipment failures (69.6 percent of responses captured from European
controllers) compared to Asian controllers (42.1 percent of responses captured from
Asian controllers). The sample of African controllers is too small for any comparison.
40
Frequency
30
20
33.01%
31.07%
29.13%
10
3.88%
2.91%
Minimal
Not at all
0
Very much
Much
Moderately
Past experience
Figure 6-10 Controllers reliance on other factors (e.g. past experience) throughout the recovery
process
Figures 6-8 to 6-10 and frequency analysis show that controllers mostly rely upon other
factors (e.g. past experience) when dealing with equipment failures. This is followed by
situation-specific problem solving and finally written procedures. After investigation of
factors that affect controller recovery, the next section focuses on the survey objective
and the assessment of the most unreliable ATC system/tool.
6.7.3.3 The most unreliable ATC systems/tools (Q3)
The data used for the analysis of the most unreliable ATC equipment are based on two
particular questions, 5 and 9. Question 5 consisted of examples of equipment failures
that severely impacted on the controllers work. Question 9 asked controllers to list the
three most unreliable ATC systems/subsystems they have experienced. The data
obtained from both questions were collated and pre-processed to remove any duplicate
158
Chapter 6
Questionnaire Survey
answers. This was necessary as controllers tended to give the similar response to both
questions.
The results of the analysis of questionnaire responses from 34 countries were found to
be similar to those obtained from the analysis of operational failure reports, presented
in Chapter 4. The questionnaire survey shows that the three most affected ATC
functionalities are: communication (37.2 percent of all examples provided), data
processing (24.6 percent), and surveillance (23 percent) (Figure 6-11). More precisely,
the following five equipment types are affected most:
air-ground communication (12.03 percent of all examples provided);
primary surveillance radar ( 9.1 percent);
flight data processing system (7.75 percent);
communication panel ( 7.49 percent); and
ground to ground communication (6.68 percent).
Figure 6-11 Distribution of affected ATC functionalities as reported in the questionnaire survey
Table 6-2 establishes the link between the most unreliable ATC functionalities and
existing recovery procedures, as reported by 134 controllers from 34 countries
representing various regions of the world. The link is established based on responses
to questions 5, 9, 10, and 11. In addition, the analysis was conducted at the country
level rather than ATC Centre level to avoid direct reference to sensitive information
specific to ATC Centres. It should be noted that because of this, inaccuracies are
possible only for the cases when the controllers did not have a full awareness of the
availability of recovery procedures in their ATC Centres.
159
Chapter 6
Questionnaire Survey
Table 6-2 Mapping between most unreliable ATC functionalities and existing recovery
procedures for the countries sampled
Country
Ireland
Most unreliable
ATC functionalities
Communication
Navigation
Surveillance
Data processing
Pointing/input
devices
Finland
Serbia
Communication
Surveillance
Data processing
Communication
Surveillance
Data processing
Switzerland
Communication
Navigation
Surveillance
Data processing
Pointing/input
devices
Surveillance
Communication
Surveillance
Netherlands
Data processing
Pointing/input
devices
Spain
Communication
Surveillance
Data processing
Communication
Surveillance
Data processing
Communication
Norway
Italy
France
Surveillance
Data processing
Pointing/input
devices
Communication
Navigation
Surveillance
Data processing
Communication
Surveillance
Data processing
Radar failure
Total system failure
Frequency failure
Total radar failure
Fire contingencies
Frequency failure, on-line data interchange (OLDI) link
failure, communication panel failure, telephone failure,
headset failure, intercom failure
Radar failure, failure of the radar display
FDPS failure
Frequency failure
Runway/taxiway lights failure
Radar failure
Frequency failure, telephone failure
Radar failure
FDPS failure, RDPS failure
Power outage, air conditioning failure, fire evacuation,
meteorological equipment failure, failure of navigation
160
Chapter 6
Sweden
Slovenia
Belgium
Macedonia
Croatia
Questionnaire Survey
Communication
Surveillance
Data processing
Pointing/input
devices
Safety nets
Communication
Data processing
Communication
Surveillance
Communication
Data processing
Pointing/input
devices
Communication
Surveillance
Data processing
Denmark
Communication
Surveillance
Data processing
Communication
Data processing
Communication
Portugal
South Africa
Tahiti
FDPS failure
Frequency failure, telephone failure
Radar failure
Frequency failure, telephone failure, voice switching
and communication system (VSCS) failure
Radar failure, radar display failure
Strip printer failure
Communication
Radar failure
Frequency failure, telephone failure, FDPS failure,
power outage
Telephone failure, intercom failure
Failure of navigation equipment, instrument landing
system (ILS) failure
Radar failure
FDPS failure
Navigation
Singapore
Radar failure
Frequency failure, telephone failure
Radar failure
Navigation
Surveillance
Data processing
Communication
Tanzania
India
Power outage
Radar failure
Moldova
Iceland
aids
Frequency failure, telephone failure
Radar failures, surface movement radar failure
Surveillance
Data processing
Pointing/input
devices
Communication
Surveillance
Communication
Surveillance
Data processing
Safety nets
Frequency failure
Radar failures, failure of radar display
Frequency failure, failure of satellite communication
Communication
Surveillance
Surveillance
Data processing
161
Chapter 6
Questionnaire Survey
Communication
Surveillance
Procedures for all failure types
Malta
Macau Special
Administrative
Region
Kenya
New Zealand
China
Malaysia
Communication
Surveillance
Data processing
Pointing/input
devices
Power supply
Communication
Navigation
Data processing
Communication
Navigation
Surveillance
Data processing
Communication
Surveillance
Data processing
Safety nets
Surveillance
Communication
Surveillance
Data processing
Safety nets
Radar failure
Frequency failure
Navigation aids failure
Procedures for all failure types, radar failure, SSR
failure
Frequency failure, telephone failure
The instances in which identified failures are not supported by existing recovery
procedures are highlighted in grey. In these cases, controllers experienced ATC
equipment failures for which recovery procedures were not available in their ATC
Centre. On the other hand, the instances in which sampled controllers have not yet
experienced equipment failures, for which procedures exist, are highlighted in yellow
and separated as the last row for each country. As an example, if the communication
function was affected specifically by frequency failure, the mapping is not established
(coloured grey) if the recovery procedure did not exist for this particular failure type. In
several cases controllers reported that their ATC Centre has procedures for all failure
types. Clearly it is not possible to cover all failure types but to design generic
procedures or guidelines to perform in the case of equipment failure.
It can be concluded that inadequate mapping between recovery procedures and
equipment failures experienced by controllers occurred in many cases. The most
severe cases are those in which countries do provide at best only one type of recovery
162
Chapter 6
Questionnaire Survey
procedure. This was identified in several European countries (i.e. Finland, Macedonia,
Iceland, and Malta), in two African countries (i.e. South Africa and Kenya), and two
Asian/Pacific countries (i.e. Tahiti and Malaysia). The most neglected ATC functionality
was found to be data processing, followed by surveillance and communication. The
paradox is that the qualitative equipment failure impact assessment tool (Chapter 4)
identified exactly these three ATC functionalities as the most challenging to controller
recovery.
6.7.3.4 Organised exchange of information on equipment failures (Q4)
40.3 percent of the controllers surveyed reported that their ATC Centres have
organised exchange of information on equipment failures between colleagues. 49.3
percent reported a lack of this exchange of experience whilst 10.4 percent did not
answer this question.
Contradictory responses were obtained from 14 ATC Centres and are further
investigated by responses given to the subsequent question, i.e. whether the organised
exchange of experience is supported by management as a good working practice.
From the ATC Centres that have exchange of experience, 76 percent have formal
processes approved by management as opposed to the practice based on word of
mouth that reaches only a small portion of controllers. The question was intended to
capture initiatives by management to provide means to share experience on equipment
failures in an organised manner. This may be achieved using different methods, such
as seminars, company newsletters, safety bulletins, memorandums, and workshops. In
these ways the lessons learnt are disseminated not only between the controllers
directly experiencing the effects of the failure, but within the entire ATC Centre and
often within the same country.
Based on this additional assessment, the following countries do not have formal nor
informal processes for exchange of experience on equipment failures: Italy, Ireland,
Croatia, India, Slovenia, Maastricht ATC Centre (as opposed to Amsterdam Centre),
Switzerland, Slovenia, Macau SAR, and Kenya.
The data indicates that there is room for improvement. There is a clear need for the
implementation of formal processes for exchange of experience on equipment failures
including failure modes and recovery processes. This should form part of a wider safety
culture within ATC Centres which is the responsibility of management. The past has
proven this type of indirect training to have a beneficial safety impact in a similar way to
163
Chapter 6
Questionnaire Survey
164
Chapter 6
Questionnaire Survey
Table 6-3 Existence of recovery procedures, recovery training, and recurrent training as
reported in the questionnaire survey
Country
Ireland
Finland
Serbia
Switzerland
United
Kingdom
Netherlands
Germany
Spain
Norway
Italy
France
Sweden
Slovenia
Belgium
Macedonia
Croatia
Moldova
Iceland
Denmark
Portugal
South Africa
Tanzania
India
Singapore
Tahiti
Australia
Shannon
Dublin
Cork
Kemi
Belgrade
Zurich
Geneva
Existence of
recovery
procedure
?
Yes
?
No
Yes
Yes
Yes
Bristol
Yes
Yes
No
Maastricht
Nieuw Milligen
Amsterdam
Karlsruhe
Langen
Frankfurt
Seville
Olso
Kirkenes
Stavanger
Bodo
Rome
Bologna
Naples
Venice
Milan
Paris
Nice
Stockholm
Malmo
Gothenburg
Ljubljana
Brussels
Skopje
Split
Zagreb
Pula
Zadar
Chisinau
Reykjavik
Copenhagen
Lisbon
FAJS
Dar el Salaam
Mumbai
Kolkata
Singapore
Papeete
Melbourne
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
?
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
?
Yes
Yes
Yes
Yes
Yes
?
Yes
Yes
No
Yes
No
No
No
Yes
No
Yes
No
No
Yes
Yes
Yes
No
No
No
No
No
No
Yes
No
Yes
?
Yes
Yes
?
?
Yes
?
No
Yes
No
Yes
No
No
Yes
No
Yes
No
Yes
Yes
?
No
No
No
No
No
No
No
Yes
Yes
Yes
No
No
Yes
No
Missing data
Missing data
Yes
?
Yes
?
Yes
No
Yes
No
Yes
?
No
ATC Centre
165
Existence of training
for equipment failures
Existence of
recurrent training
Yes
No
?
Yes
No
Yes
Yes
?
?
?
Yes
No
?
?
Chapter 6
Austria
Romania
Malta
Macau SAR
Kenya
New
Zealand
China
Malaysia
Questionnaire Survey
Vienna
Bucharest
Malta
Loqa airport
Macau
Nairobi
Wellington
Auckland
Christchurch
Hong Kong
Subang
Yes
Yes
No
Yes
Yes
?
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
?
Yes
Yes
Yes
?
Yes
?
Yes
Yes
No
Yes
?
No
No
Yes
Yes
No
No
Table 6-2 shows that 93.1 percent of sampled ATC Centres do have some form of
recovery procedure in place (i.e. 54 ATC Centres). The types of equipment failures
mostly covered by recovery procedures in sampled ATC Centres are:
radar failure (reported by 40.2 percent of controllers surveyed);
failure
of
communication
function:
radio
telephony,
ground
to
ground
the
requirement
for
recovery
procedures
from
radar
failure,
communication systems failure, the need for back-up systems, and procedures for
handling outages at ATC Centre level. Furthermore, 88 percent of controllers rate
available recovery procedures as clear and understandable, while 72 percent rated
them as realistic and feasible to perform.
69 percent of controllers surveyed reported that recovery procedures documentation is
easily accessible, i.e. they are placed in close proximity to controller working positions.
4
The discussion presented in Chapter 5 showed that ICAO provides recovery procedures for
the communication and surveillance functionalities but not for the data processing functionality.
166
Chapter 6
Questionnaire Survey
Finally, 77 percent of controllers reported that available recovery procedures are linked
or harmonised to other procedures specified within the Manual of Air Traffic Services
(MATS), e.g. on suite allocation of tasks (separation of responsibilities between
executive and planner controller), and duties of the staff such as the approach
controller, the ground controller, or the watch manager.
From the survey data and subsequent analyses, it can be concluded that majority of
sampled ATC Centres have some form of recovery procedures. The majority of
controllers reported that these procedures are up-to-date, comprehensive, easily
accessible, and compatible with other procedures. Moreover, controllers emphasise
the need for procedures on radar and communication failures.
6.7.3.5.1 Other findings regarding the recovery procedures
In addition to the findings in the previous section, the questionnaires narrative section
highlighted interesting safety-relevant issues regarding recovery procedures. These are
individual comments rather than findings representative of the entire sample. The
reported issues are categorised in three groups, namely equipment specific, teamwork
specific, and generic recovery related issues. These are discussed in the following
paragraphs.
The equipment related issues highlighted major problems with the flight data
processing system not covered in the operational manuals. In addition, controllers
reported a lack of back-up facilities. One example indicated that during radio
communication system failure, a particular ATC Centre had only ten emergency radio
devices for the operational room with a 20 seat configuration.
On teamwork related issues, the controllers mostly reported inadequate familiarisation
with contingency procedures on the part of technical staff and controllers in
neighbouring sectors. In general, the controllers highlighted the important role of
teamwork and the need for an experienced planning controller in the event of
equipment failure. Another example drew attention to the unavailability of technical staff
during night shifts to immediately provide assistance in the case of equipment failure.
In short, controllers feel that teamwork is important in dealing with failures and that
Team Resource Management (TRM) training, aimed at enhancing teamwork efficiency,
should be mandatory for all ATC Centres.
167
Chapter 6
Questionnaire Survey
Finally, many individual recovery related issues, such as context, procedures, and
working practice, are also highlighted in the questionnaires narrative part. These are
as follows:
Situation-specific problem solving plays a major role as all equipment failures
occur within a specific context (e.g. bad weather, frequency jamming, high/low
traffic levels);
There is a need for a similar approach to recovery procedures as are available to
pilots. In other words, a comprehensive manual with all possible failures and
corresponding recovery steps is needed during controller training. For the
operational environment, it would be necessary to design an abbreviated version
of the contingency manual available at each controller working position (e.g. aidememoire in the form of check-list, see Appendix III); and
Accurate and efficient strip marking is seen as the most reliable recovery tool in
the case of radar or flight data processing failure.
168
Chapter 6
Questionnaire Survey
same ATC Centre gave opposite responses to the questions on existence of recovery
training. All these inconsistencies are further investigated using the subsequent
questions related to recovery training (i.e. 25th to 28th question). Although controllers
from these ATC Centres reported contradictory responses on existence of the recovery
training (i.e. 21st question), their answers to subsequent training-related questions did
not reveal any further information. Therefore, a conservative approach has been taken
and these 10 ATC Centres are considered not to have recovery training in place.
In the case of recurrent training, the analysis shows that only 36.2 percent of the whole
sample of ATC Centres have recurrent training, 43 percent do not, while the rest of the
data is either inconsistent or missing. Recurrent training is provided once a year in 25
ATC Centres and bi-annually in three ATC Centres (Oslo-Norway, Bucharest-Romania,
Auckland-New Zealand). In addition, Geneva and Melbourne ATC Centres provide
recurrent training three times per year, while Frankfurt ATC Centre provides recurrent
training 20 times per year. In the latter a contingency system is used every weekend to
train controllers.
Further analysis of the ATC Centres with recurrent training frequency higher than once
a year, shows that all have recovery procedures in place, while the majority (i.e. 64
percent) have an organised exchange of information on equipment failures. The
Auckland ATC Centre emphasised that recovery performance was difficult before the
introduction of clear and easy to follow procedures. Moreover, this ATC Centre
highlighted that operations impact on recovery training as the recent failure types are
included in the recurrent training. Although the Oslo ATC Centre has recovery
procedures, its controllers report the need for more comprehensive and easily available
procedures (e.g. checklist type procedures on each console). These controllers
expressed a need to step away from increased dependency on experience when
handling equipment failures.
From the subset of controllers who have recurrent training once a year, 55 percent
believe that this is adequate, with the rest express the need for higher frequency in
order to build competency in handling unexpected equipment failures. When asked if
the training covers all important equipment failures, the majority of controllers (i.e. 63
percent) answered negatively. The most frequent issues mentioned to be added to the
current training syllabus are:
complete radar failure simulated in a comprehensive and realistic way;
total power failure;
169
Chapter 6
Questionnaire Survey
facility evacuation;
team resource management (TRM);
different types of aircraft problems (e.g. communication failure, engine failure,
landing gear problem);
hot standby procedures (system running in the background ready for immediate
use); and
radar bypass (radar information is presented directly at the radar display without
having been processed, resulting in the presentation of uncorrelated tracks only).
61 percent of controllers believe that the training methods utilised in their ATC Centres
are suitable, or more precisely, realistic and varied. Furthermore, according to the
responses from 63 percent of controllers surveyed the recovery training is compatible
(i.e. linked to other training schemes). In general, it is essential to harmonise recovery
training within the overall training syllabus. One option is to include recovery training
within each training course, such as ab-initio training, conversion course, continuity or
recurrent training, training for unusual situations, and TRM training. The other option is
to provide separate recovery training sessions on a regular basis. Regardless of the
approach, ATC management has to assure an inclusive, regular, and consistent
approach in training for recovery to its entire population of controllers.
From the survey data and subsequent analyses, it can be concluded that the majority
of the ATC Centres surveyed have some form of recovery training although not
necessarily provided consistently throughout the Centre. The situation with recurrent
training is worse as in the majority of cases, this type of training is not provided
regularly. This results in the extensive reliance on experience in dealing with equipment
failures which may pose a significant safety threat in ATC Centres with a large
percentage of newly established and thus less experienced controllers. In general, the
controllers surveyed want to step away from over reliance on experience and be
regularly trained as much as possible.
6.7.3.6.1 Other findings on training for recovery
In addition to the findings in the previous section, the questionnaires narrative section
highlighted interesting safety-relevant issues regarding recovery training. These are
individual comments rather than findings representative of the entire sample. The
reported issues focus on the quality and frequency of recovery training.
170
Chapter 6
Questionnaire Survey
According to the controllers surveyed the main problem is the overall lack of training,
for supervisors, engineers, and controllers. The controllers believe that a couple of
hours of training per year is far too little practice and some of them feel that recurrent
training is necessary at least twice a year. In the event of more critical equipment
failures (e.g. radar) with high traffic levels, there may be occasions that there is no time
to act upon the recovery procedures. On these occasions the role of training as well as
teamwork has a much greater importance.
The controllers are aware that it is almost impossible to include everything that can go
wrong within the training syllabus, but emphasise that more training and guidance
should be given. They also highlight that training sessions should be as realistic as
possible in the simulated environment (e.g. higher traffic levels and the need to use
radar fallback system regularly). Currently, in some ATC Centres, the training only
focuses on outages (i.e. failure of the entire ATC system) and not on everyday failures.
An example of an ATC Centre where recurrent training takes place only on a night shift
highlighted inconsistent provision of training throughout the ATC Centre, as only those
controllers on a night shift get recovery training.
6.7.3.7 Other findings on recovery performance
This section deals with additional findings extracted specifically from question 5. This
question aimed to provide an opportunity to controllers to discuss their past experience
with equipment failures which seriously impacted on their work. The findings extracted
from question 5 are presented in Appendix VI.
While section 6.7.3 has provided a high level analysis and results of the survey, the
following section carries a more rigorous analysis of the data.
171
Chapter 6
Questionnaire Survey
Existence of
recovery training
Existence of
recovery procedures
Formal exchange of
information
Factors that
influence recovery
performance
Experience with
equipment failures
Rating
Operational
experience
The nature of the variables under consideration determined which statistical methods
could be used to analyse the data. As can be seen from their description in this
Chapter, three variables are categorical (rating, factors that influence recovery
performance, formal or management supported exchange of information on equipment
failures) whilst two represent a continuous or ratio scale variable5 (operational
experience-length of service, experience with equipment failures-frequency per year).
As data differ significantly from the normal distribution, several non-parametric tests
with 95 percent significance level have been used. As previously explained in Chapter
4 (section 4.4.1), chi-square tests are used to test the relationships between two
categorical variables. Furthermore, the Cramers V test is used to measure the
5
172
Chapter 6
Questionnaire Survey
association for nominal data (i.e. interactions between factors that influence recovery
performance with rating and existence of formal exchange of information on
equipment failures) whilst the Kendall tau test is used for ordinal data (i.e. factors that
influence recovery performance). The relationship between two ratio variables is tested
via non-parametric correlation or Kendalls tau statistics which uses the ranks of the
data to calculate correlation coefficient. Correlation coefficient ranges between -1 and
1, where its sign indicates the direction of the relationship (either positive or negative)
whilst its absolute value indicates the strength of the relationship.
Finally, the relationship between ratio and categorical variable is tested using the nonparametric Mann-Whitney test. The test is used to assess whether two samples of
observations come from the same distribution (Shier, 2004). The test involves the
calculation of a statistic, referred to as U (see equation 6-1).
U = n1n 2 +
n1(n1 + 1)
R1,
2
6-1
where n1 and n2 are the two sample sizes, and R1 is the sum of the ranks all the
observations in sample 1. Samples greater than 20 are assumed to follow normal
distribution, thus U statistic is converted to a Z score using the formula in equation 6-2
(Shier, 2004):
n1n 2
2
n1n 2 (n1 + n 2 + 1)
12
largest U value
Z=
6-2
Variable 2
Test
ACC
Operational experience
(length of service)
Mann-Whitney
non parametric
test
APP
TWR
Operational experience
(length of service)
Experience with
equipment failures
(frequency per year)
Written procedures
Situation-specific
problem solving
Other
173
Non-parametric
test (Kendalls
tau)
Mann-Whitney
non parametric
test
Statistical
significance at 95
percent confidence
level
p>0.05
p<0.001 (U=1382.5,
z=-3.56)
p=0.014 (U=3387.5,
z=-2.46)
p>0.05
p>0.05
p>0.05
p>0.05
Chapter 6
Rating
Questionnaire Survey
ACC
APP
TWR
ACC
APP
TWR
Factors that
influence
recovery
performance
Factors that
influence
recovery
performance
Written
procedures
Situationspecific
problem
solving
Written
procedures
Situationspecific
problem
solving
Other
Number of equipment
failures experienced
annually (Q4)
as above
Non-parametric
test (Cramer's V)
Written procedures
Situation-specific
problem solving
Other
Situation-specific
problem solving
Other
Other
Mann-Whitney
non parametric
test
p>0.05
p>0.05
p>0.05
p=0.0086
p>0.05
p>0.05
p>0.05
p>0.05
p>0.05
p>0.05
Non-parametric
test (Kendalls
tau)
p>0.05
p<0.001
p>0.05
Formal exchange of
information (Q7)
Non-parametric
test
(Cramer's V)
p>0.05
p=0.029
Statistical tests performed indicated five significant relationships (Table 6-5). Significant
relationships are found between controllers with APP rating and TWR rating and years
of operational experience (i.e. years in service). In the sample surveyed, controllers
with APP rating have more operational experience compared to those without this
rating. Similarly, controllers with TWR rating have more operational experience
compared to those without it. Secondly, a significant relationship is identified between
other factors that influence recovery performance and ACC rating. Data indicates that
controllers with ACC rating tend to rely upon other factors (e.g. past experience) more
than those without ACC rating. This is expected as controllers with ACC rating in the
available sample have more operational experience than those without ACC rating.
Thirdly, a significant relationship is identified between controller reliance on situationspecific problem solving and other factors (e.g. past experience) when recovering from
equipment failures. This is expected as past experience represents one of the factors
that define the situation surrounding (context) of an equipment failure. Finally, a
significant relationship is identified between controller reliance on other factors (e.g.
past experience) when recovering from equipment failures and management supported
Relationship between other factor that influence recovery procedure and ACC rating.
174
Chapter 6
Questionnaire Survey
exchange of information regarding equipment failures (Table 6-5). It may be the case
that controllers account for exchange of information regarding equipment failures as a
type of past experience.
On the other hand, no relationship is identified between the factors that influence
recovery process and operational experience (i.e. number of years active as a
controller). Although it was expected that less experienced controllers may rely more
on written procedures and that more experienced controllers may rely more on past
experience, statistical testing did not support these expectations. Years in service do
not differentiate between reliance upon a written procedure, context, or other factors
(e.g. past experience). It may be the case that the overall safety culture built in the ATC
Centre determines what a controller may use as the main resource in recovering from
equipment failures. Therefore, if the procedures are not available, they will rely more on
situation-specific problem solving. Therefore, this decision would be based on
organisational issues more than their own experience.
6.8 Summary
This Chapter has discussed in detail the questionnaire survey that sampled 134
controllers in 58 ATC Centres from 34 countries. The survey was designed to achieve
four main objectives. Firstly, to build on the literature review to further investigate
equipment failures and factors that influence controller recovery by introducing
operational experience. Secondly, to support the information obtained from operational
failure reports (as represented in Chapter 4), which lacked the input on controller
recovery. Thirdly, to assess the status and quality of recovery procedures and training
in the sampled set of ATC Centres. Finally, to contribute to the wider human reliability
research with a particular focus on controller recovery from equipment failures.
The results of the analyses conducted on the data consist of several interesting
findings. These are structured around six key questions that this survey addresses.
175
Chapter 6
Questionnaire Survey
176
Chapter 6
Questionnaire Survey
prescribed steps and thus to recover. An example of a concise check-list type recovery
procedures developed in this thesis for a specific European ATC Centre is presented in
Appendix III. It is based on a format used previously by the German air traffic service
provider (DFS) accepted and published by EUROCONTROL (2003f).
What do controllers feel about the quality of training currently available for recovery
from equipment failures (Q6)?
Assessment of the existence and quality of training for recovery shows that only half of
the ATC Centres surveyed have established training for recovery from equipment
failures. The situation with recurrent training is even worse as only 36 percent of ATC
Centres surveyed organise regular recurrent training. In most cases, recurrent training
is provided only once a year, while in nine ATC Centres it is provided twice a year. On
the other hand, controllers support the idea of very frequent recurrent training. Almost
half of the respondents (i.e. 45 percent) feel an annual training session for a couple of
hours is simply not enough to keep them proficient and ready to deal with unexpected
equipment failures.
The process of identification of factors that affect controller recovery started in the
previous Chapter by an overall assessment of past research relevant to controller
recovery. It has continued in this Chapter by expanding these findings with the
questionnaire survey results and operational experience of controllers worldwide.
Based on these findings, the next Chapter finalises this rigorous process by identifying
factors that affect controller recovery, referred to as Recovery Influencing Factors
(RIFs).
177
Chapter 7
This Chapter builds on the findings from past research of relevance to controller
recovery (Chapter 5) further augmented by the operational experience extracted from
the questionnaire survey (Chapter 6) to realise a detailed understanding of the context
that surrounds a controller during the occurrence of an unexpected equipment failure.
The Chapter starts by illustrating the importance of the impact that contextual factors
have on controller recovery from equipment failures in Air Traffic Control (ATC). It
reviews both Air Traffic Management (ATM) and non-ATM related Human Reliability
Assessment (HRA) techniques to assure a comprehensive investigation of contextual
factors relevant to controller recovery from equipment failures in ATC. This initial
selection is augmented by the findings from the equipment reliability literature,
operational failure reports, human reliability research, and interviews with ATM
specialists. The Chapter concludes by identifying a set of relevant contextual factors,
referred to as Recovery Influencing Factors (RIFs), and their qualitative descriptors or
the levels of their influence on controller recovery performance.
178
Chapter 7
reducing specific types of erroneous actions by means of technical recovery (i.e. builtin defences) and human recovery.
It is also necessary to take into consideration contextual factors that traditionally may
not be recorded by investigating bodies, but which can have a significant impact on the
outcome of an accident. In support of this, Dekker et al. (2004) note that it is
necessary to capture both a situation in which the action takes place and the action
itself. Similar arguments were presented by researchers at the National Aeronautics
and Space Administration (NASA) Ames Research Centre, who pointed out that "we
must move beyond trying to pin the blame for accidents on a culprit but seek instead to
understand the systemic causes underlying the outcomes" (cited in Cox, 2005). The
research presented in this thesis expands the analysis of equipment-related incidents
to include the context in which controller recovery unfolds. Therefore, the objective of
this Chapter is to determine the relevant contextual factors that affect the process of
controller recovery from equipment failures in ATC.
In Air Traffic Management (ATM), the contextual factors relevant to controllers are
defined as internal or external factors which influence the controllers performance of
ATM tasks (EUROCONTROL, 2002b). It is notable that this definition is generic and
thus does not give an indication as to when it is appropriate to stop looking further for
contextual factors. The so-called stopping rule is taken to be directly linked to the
overall investigation process, where assessment of contextual factors represents only
one segment of that process. In other words, it is the role of the investigator to
determine the chain of events that constitute a safety-relevant occurrence. In this
respect, the analysis of contextual factors should cover the entire chain and assess the
relevant context for each link in the chain. The research presented in this thesis adapts
the EUROCONTROL definition of contextual factors. Hence, the contextual factors in
this research or Recovery Influencing Factors (RIFs) are defined as internal or
external factors that influence the controllers recovery from unexpected equipment
failures in ATC.
The factors extracted from the various techniques are known in the HRA literature as
Contextual Conditions CCs (EUROCONTROL, 2002b), Performance Shaping
Factors - PSFs (Shorrock, 1992; Shorrock and Kirwan, 2002; EUROCONTROL, 2004e;
THEMES, 2001; Swain and Guttman, 1983), Error Producing Conditions EPC
(EUROCONTROL, 2004d; Williams, 1986), Common Performance Modes CPMs
179
Chapter 7
180
Chapter 7
time course of failure development (i.e. sudden failure), and complexity of failure type
(i.e. multiple failure: several workstations, clock, and simulation platform affected).
The second report contained the following: The loss of radar display and VSCS at a
time of moderate traffic (approximately 10 aircraft on frequency) created substantial
workload on the controller. Thankfully, there were two controllers in the near vicinity
who were able to assist with a transition to a nearby controller working position and to
help maintain situational awareness and communications with the various aircraft via
air-ground (AG) bypass. This report highlighted the impact of traffic complexity at the
moment of failure occurrence (i.e. ten aircraft in simultaneous communication with the
controller), personal factors (i.e. substantial workload), communication for recovery
within a team (i.e. assistance with handling the traffic and maintaining traffic awareness
in spite of the loss of all critical systems: visual representation of traffic on display and
direct communication with relevant aircraft), adequacy of organisation (i.e. availability
of additional support), number of workstations affected (i.e. one workstation), and
complexity of failure type (i.e. multiple systems affected: radar display and
communication system).
The two brief cases above taken from an incident database illustrate the important
relationship between failure, recovery, and relevant contextual factors. In other words,
these equipment failure examples have shown that the context in which human
performance takes place is important in understanding human reliability. Although the
examples do not convey the complete picture of the occurrence of equipment failure
(e.g. no mention of any personal issues in the first example, weather), several
contextual factors have been captured. As a result, research on controller recovery
from equipment failures in ATC requires a precise definition of the context surrounding
any failure type. In order to achieve this objective, it is necessary to review the specific
contextual factors defined in various HRA techniques. This is used together with
information from equipment reliability literature to identify the Recovery Influencing
Factors (RIFs).
181
Chapter 7
on human error per se or the underlying human information processing theory. The
literature on human error has been used simply to investigate the relevant factors that
influence the human performance in unusual/unexpected events (i.e. contextual
factors). As a result, human information processing theories used in assessed HRA
techniques are outside the scope of this thesis.
It is also important to note that although there are currently three HRA techniques used
in the ATM sector, the review presented here has also considered other HRA
approaches employed in other domains to assure a complete set of RIFs. Furthermore,
a review of relevant equipment-failure characteristics and dynamic situational factors
has been conducted in order to augment the results from the review of the HRA
techniques. This is to ensure a complete and reliable determination of the RIFs. The
RIFs are then verified by interviews with ATM specialists. Figure 7-1 presents the
methodology used in this thesis to extract a candidate set of contextual factors relevant
to controller recovery from ATC equipment failures.
Methodology to extract a
candidate set of Recovery
Influencing Factors (RIFs)
ATM related
HRA techniques
Output
Identified gaps
Augmentation with
findings from other
HRA techniques
Output
Identified gaps
Augmentation with
equipment-failure
related
characteristics
Output
Identified gaps
Augmentation with
dynamic situational
factors
Output
182
Verification of
selected RIFs by
two ATM Specialists
Chapter 7
2002b;
EUROCONTROL,
The US Federal Aviation Administration (FAA) developed the Human Factors Analysis and
Classification System (HFACS) tool.
183
Chapter 7
184
Chapter 7
The main difference between TRACEr and HERA is that the former does not include
pilot actions and weather (see Appendix VII). Thus, no additional candidate factors
could be extracted from TRACEr.
7.2.1.3 Recovery from Automation Failure (RAFT) Tool
As previously discussed in Chapter 5, this tool has been developed as a part of the
Solutions for the Human-Automation Partnerships in European ATM (SHAPE) project,
managed by the Human Factors Division of EUROCONTROL. The SHAPE project
defines context as any aspect of the operating environment that can influence a failure
or recovery process (EUROCONTROL, 2004e). The project focused on the contextual
factors affecting recovery, which is in line with the objective of this thesis. The relevant
contextual factors or PSF categories recognised in RAFT are: task load and system
complexity, pilot-controller communication, procedures and documentation, training
and experience, human-machine interaction, personal factors, social and team factors,
logistical factors, and other organisational factors.
A review of the RAFT PSFs shows that task load and system complexity represents a
workload facing the controller as a result of task performance and overall system
complexity. Therefore, this factor has a potential to be included as a RIF. Compared to
HERA, RAFT disregards pilot action, weather, and environment as relevant
contextual factors for human recovery from equipment failure in ATC. Whilst pilot
actions do not have much impact as explained in section 7.2.1.1, weather can bring
additional complexity to the occurrence of equipment failure. At the same time, RAFT
includes a new category called logistical factors, which includes maintenance and
staffing issues.
Environmental issues (e.g. noise, temperature, and lighting) are excluded. The reason
for this is that controllers are used to ambient characteristics by working in a specific
ATC Centre. On the other hand, logistical factors will be assigned to the existing
organisational factors category. The reason for this lies in the fact that staffing and
maintenance issues should be anticipated and pre-planned at organisational or
managerial level (e.g. maintenance scheduling, availability, and assignment of
personnel, stock of equipment and spare parts, on-the-job training aids). The
management in any ATC Centre should anticipate as far as possible unscheduled
technical disturbances and provide necessary defences for their prevention.
185
Chapter 7
The three techniques (HERA, TRACEr, and SHAPE/RAFT tool) above were developed
specifically for the ATC/ATM environment. In general, they defined context and
contextual factors in a similar way as it is defined in this thesis. The assessment of
these three models identifies a total of nine candidate RIFs. These are: communication,
traffic and airspace, weather, procedures, training and experience, HMI, personal,
organisational factors, and task complexity.
Whilst the review of ATM related HRA techniques gives many relevant contextual
factors, it worth examining relevant non-ATM HRA techniques to investigate if other
factors exist. The following sections provide an insight into the relevant findings.
7.2.1.4 Recovery from failures: understanding the positive role of human
operators during incidents
This research attempted to emphasise the positive role of human operators in the
overall system performance. In addition, it proposed a preliminary failure compensation
process model (or recovery model) derived initially for the chemical process industry.
Furthermore, the importance of a taxonomy used to describe the factors influencing
recovery was recognised. Based on the experience gained from field studies and the
relevant literature, Kanse and van der Schaaf (2000) developed a list of RIFs. In their
research the recovery factors were defined as factors that contribute to human
recovery performance once an error or failure has occurred. This definition
corresponds to the definition of RIFs adopted in this thesis. A categorisation into six
groups of RIFs adopted by Kanse and van der Schaaf (2000) from the power plant
industry is presented in Table 7-1.
Table 7-1 Factors influencing recovery from failures (from Kanse and van der Schaaf, 2000)
Categories of factors
Prioritisation of
recovery-related
tasks
Occurrence-related
Human (person)
related
186
Chapter 7
Social
Organisational
Technical/workplace/
situational
The majority of the identified factors are relevant to equipment failures in ATC and
should be considered as potential RIFs. For example, available and applicable
barriers/defences are important with respect to detection, diagnosis, and correction of
equipment failure. Time pressure is recognised under the prioritisation of recoveryrelated tasks. Equipment failures in ATC are unexpected events, which degrade the
ATC service offered. In this case controllers are still required to provide a service to
ensure a safe flow of traffic. As a result, controller workload increases rapidly
potentially compromising controller performance. Therefore, this factor should be
analysed for potential inclusion into the RIFs. Occurrence-related factors are mostly
applicable to the power plant environment and as such could not be directly applied to
ATC. However, if transferred to the characteristics of the ATC environment, these
factors may be relevant to equipment failure occurrence.
7.2.1.5 Computerised Operator Reliability and Error Database (CORE-DATA)
The CORE-DATA database was developed at the University of Birmingham to assist
the UK personnel involved in the assessment of hazardous systems such as nuclear,
chemical,
and
offshore
systems
(Kirwan,
Basra,
and
Taylor-Adam,
1997;
187
Chapter 7
188
Chapter 7
Stressors
Psychological
stressors
Physiological stressors
A review of the contextual factors relevant to THERP reveals that most can be
allocated to the RIFs identified by the first three ATM-related techniques. Several other
factors, such as decision-making, short-term, and long-tem memory (external PSF)
may be categorised as personal factors. These factors may become increasingly
important within the planned modernisation of ATM (i.e. datalink, electronic strips, or
stripless environment). Finally, the suddenness of occurrence factor identified in
THERP is not possible to categorise within existing RIF groups. This factor is relevant
for the occurrence of equipment failure in ATC environment as it greatly affects the
controller detection. Hence it should be treated as an additional potential RIF.
7.2.1.7 Human Error Assessment and Reduction Technique (HEART)
The HEART technique was developed by Jeremy Williams, a British ergonomist, in
1985. The review of this technique is available in EUROCONTROL (2004d) and
189
Chapter 7
Williams (1986). It is one of the most popular human error quantification techniques
due to its ease of implementation and is still used extensively in the nuclear, chemical,
petrochemical, railway, and defence industries.
HEART was derived from a wide range of findings in ergonomics literature. The
technique defines a set of generic error probabilities for the tasks considered, and
identifies the Error Producing Conditions (EPC) associated with these. EPCs include
particular ergonomic, task (e.g. inactivity, repetitious, or low mental workload tasks,
additional team members necessary to perform task normally), and environmental
factors that could each have a negative effect on human performance. In other words,
the definition of contextual factors or EPCs emphasises purely their negative impact on
human performance. The extent to which each EPC factor affects performance is
quantified and the human error probability is calculated as a function of the precise
effect of each EPC on a particular task. HEART assumes that basic human reliability is
dependent upon a generic nature of the task to be performed and that under nominal
conditions this level of reliability will tend to be consistent (Williams, 1986).
This technique identified 38 different Error Producing Conditions (EPC). These can be
categorised into two groups, those directly transferable to ATC and those that are not.
The EPCs relevant to ATC can be further sub-divided into those that fit within existing
RIF categories and those that do not. The former are, for example, unfamiliarity with a
situation which is potentially important but which only occurs infrequently or which is
new, a shortage of time available for error detection and correction, and a channel
capacity overload. The EPC concerned with unfamiliarity with a situation may be
captured through two RIFs i.e. training and experience. Unusual or emergency
situations (such as ATC equipment failures) are rare but highly demanding events that
require efficient and effective response from each controller. Regular and
comprehensive training plays a key factor in building the skills and experience
necessary to cope with such unusual situations. Shortage of time available has
already been discussed and recommended to be included as a candidate RIF (see
section 7.2.1.5). Finally, channel capacity overload is a term used for the workload
caused by simultaneous presentation of critical information to the human operator. As
such it can be classified under personal factors.
The EPCs not relevant to ATC include several factors. For example, a category
mismatch between the educational level and the requirements of the task is not
applicable to controllers. The level of education and training for ATC licence is
190
Chapter 7
standardised and reflects the knowledge controllers should acquire. Furthermore, the
category an incentive to use more dangerous procedures is also not applicable to
ATC as dangerous procedures or working practices are direct violations of the rules.
7.2.1.8 The Contextual Control Model (COCOM)
The COCOM model, developed by Hollnagel (1993), describes how human
performance is dynamically determined by the current context, as an alternative to the
common information processing models. This is a generic HRA approach not related to
any specific industry.
COCOM represents a control model of cognition focusing on two important aspects:
the conditions under which a person changes from one mode to another and the
characteristics of human performance in a given mode. COCOM recognises four
control modes: scrambled, opportunistic, tactical, and strategic. According to this
approach human actions are determined by the context as well as specific
characteristics and mechanisms of human cognition. In Hollnagels view, humans do
not passively react to events, they actively look for information and act based on
intentions as well as external developments. Therefore, it was concluded that human
actions are only meaningful when considered in the appropriate context.
In this regard, COCOM defines Common Performance Modes (CPM) as the conditions
under which the human performance takes place. Hollnagel (1993) divides them into
CPMs that may increase or decrease human reliability. The former include sufficient
available time, available plans, adequate Man Machine Interface (MMI) and support,
few simultaneous goals, normal/familiar process state, and adequate organisation. The
CPMs that may reduce reliability include insufficient available time, plans not available,
inadequate MMI and support, many simultaneous goals, abnormal process state, and
inadequate organisation.
According to Hollnagel (1993), the objective is not to find a precise probability of a
specific action but rather to identify the specific steps, which are particularly prone to
produce hazardous consequences. This knowledge can then be used to change the
design of the system, to introduce specific measures of compensation, and to construct
defences and recovery options. Generally, the objective of the recovery performance
assessment should be to identify the context that is likely to result in an inadequate
recovery performance. The characteristics of the context resulting in an inadequate
recovery performance would be used to define the necessary changes to the ATC
191
Chapter 7
192
Chapter 7
from the HRA discipline. Hollnagel used the general principle that advantageous
performance conditions improve reliability, whereas disadvantageous conditions are
likely to reduce it. If reliability is improved, operators are expected to fail less often in
their tasks and perform better in general. He proposed an expected effect of each CPC
on performance reliability at three levels: improved, not significant, or reduced. The
advantages of this approach can be seen in the direct link between the descriptors
used for CPCs and expected effect on human performance reliability. As such, the
research presented in this thesis adopted this approach (further explained in section
7.3).
In order to determine the overall effect of the context on human performance, the
CREAM technique assumes an expert judgement of the relevance of each CPC for the
particular event under investigation and its impact on the probability of failure (no
impact, improves, reduces). The resulting score is used to determine the expected
control mode, which, as previously mentioned, is: scrambled, opportunistic, tactical, or
strategic control.
Taking account of the review of both the CPMs (COCOM) and CPCs (CREAM), the
majority of the factors identified are directly transferable to ATC. The exceptions are
the number of simultaneous goals and normal/familiar process state (see Appendix VII).
Regarding the number of simultaneous goals, it is important to highlight that air traffic
control implies the simultaneous processing of multiple tasks. In other words, a
controller may be in radio contact with 10-20 aircraft simultaneously performing
computer-related tasks (e.g. entering assigned altitude information, handing off flights
to another controller). Therefore, high levels of multitasking remain inherent
characteristics of ATC (Wickens, 1992) and as such will be excluded from the list of
RIFs. The other factor (normal/familiar process state) is highly relevant to the recovery
performance but has to be indirectly mapped with training and experience.
7.2.1.10 Human Reliability Management System (HRMS)
The HRMS technique was developed to derive a comprehensive and accurate
assessment of human contribution to risk in the nuclear industry, through a detailed
task and error analysis, quantification, and practical error reduction scheme. Since this
technique was too resource-intensive, it was necessary to additionally develop a fast
screening technique. This light version required a detailed approach only for those
scenarios, which showed critical human involvement. This led to a subsequent
technique, the Justification of Human Error Data Information (JHEDI). Six PSFs were
193
Chapter 7
identified based on the assessment of several HRA techniques (Kirwan, 1997): time,
quality
of
information
and
interface,
training/expertise/experience/competence,
194
Chapter 7
195
Chapter 7
196
Chapter 7
197
Chapter 7
198
Chapter 7
Recovery
from
failures
Chemical
Recovery
Influencing
factors (RIFs)
Corresponds to
the definition is
this research
Occurrence-related
factors (available and
applicable defences such
as alarm)
Group of factors relevant
for prioritisation of
recovery-related factors
(time available/time
pressure)
COREDATA
Nuclear
chemical
offshore
Performance
Shaping Factors
(PSFs)
Corresponds to
the definition is
this research
as above
THERP
Nuclear
Performance
Shaping Factors
(PSFs)
Corresponds to
the definition is
this research
Suddenness of
occurrence (or time
course of failure
development)
HEART
Nuclear
chemical
petrochemical
railway
defence
Error Producing
Conditions
(EPCs)
Corresponds to
the definition is
this research
as above
COCOM
Generic
More generic
definition
as above
CREAM
Generic
More generic
definition
as above
HRMS
Nuclear
ATHEANA
Nuclear
CAHR
Nuclear
NARA
Nuclear
HPDB
Nuclear
Common
Performance
Modes (CPMs)
Common
Performance
Conditions
(CPCs)
Performance
Shaping Factors
(PSFs)
Performance
Shaping Factors
(PSFs)
Performance
Shaping Factors
(PSFs)
Error Producing
Conditions
(EPCs)
Factors
Additionally
include myriad
of other factors
Emphasis is
placed on purely
negative context
Emphasis is
placed on purely
negative context
Corresponds to
the definition is
this research
No definition is
provided
as above
as above
as above
as above
as above
The assessed HRA techniques and their related factors are presented in tabular form
in Appendix VII. Factors from all techniques are compared to HERA, as the most
recent HRA technique in the ATC/ATM domain. In most cases, the comparison was
straightforward since certain factors were identified in almost all techniques. (e.g. the
factor procedures). However, a number of factors could not be identified as belonging
to any of the HERA categories and were thus categorised separately (shown as
dashed boxes in Appendix VII). Although these did not specifically fit any of the HERA
categories, they were retained because of their relevance to the recovery from
equipment failures in ATC. Table 7-3 gives an overview of the RIFs that are taken
forward for further analysis in the next section.
199
Chapter 7
200
Chapter 7
Communication
Weather
Procedures
Training and experience
HMI
Personal factors
Organisational factors
Task complexity
Time available & time pressure
Available and applicable defences and
barriers & alarms
Complexity of failure
Suddenness of occurrence & Time
course of failure development
Duration of failure type
Impact on operational room (i.e.
number of workstations/sectors
affected)
Experience with system performance
(reliance)
201
Chapter 7
202
Chapter 7
definition of these 20 RIFs assumes that an equipment failure has occurred (i.e.
probability of equipment failure is 1). Otherwise, these 20 RIFs would have to be renamed and re-defined to allow an analysis of the context surrounding a particular event
under investigation, no longer being an equipment failure. Table 7-5 presents the final
set of factors relevant to the recovery from equipment failures in ATC, together with
their corresponding qualitative descriptors. It has to be noted that these 20 RIFs
represents high-level categories (e.g. personal factors) consisting of several low-level
factors (e.g. age, experience, stress, fatigue). The detailed definitions of these 20 RIFs
in this thesis are presented in Appendix VIII.
or factors
related to
working
condition
Internal factors
Table 7-5 Relevant recovery influencing factors and their corresponding qualitative descriptors
RIF name
Qualitative descriptor
Level
Suitable to the situation in question
1
Training for recovery from ATC
Tolerable to the situation in question
2
equipment failure
Counter productive to the situation in
3
question
Experienced a particular type of failure or
1
Experience with equipment
any other type of ATC equipment failure
failures
No experience with ATC equipment failures
2
Objective attitude toward the system
2
Experience with the system
Positive
experience
with
the
system
or
performance (reliance)
3
negative experience with the system
Suitable for the recovery process
1
Personal factors
Tolerable for the recovery process
2
Counter productive for the recovery process
3
Efficient
1
Communication for recovery
Tolerable
2
within team/ATC Centre
Inefficient
3
Single system affected
2
Complexity of failure type
Multiple systems affected
3
Sudden failure
1
Time course of failure
Persistent or latent failure
2
development
Gradual degradation of system
3
One workstation/one sector or all
2
Number of workstations/sectors
workstations in one sector
affected
Several workstations/couple of sectors or all
3
workstations/all sectors
Adequate
1
Time necessary to recover
Inadequate
3
Suitable to the situation in question
1
Existence of recovery procedure
Tolerable to the situation in question
2
Inappropriate
3
Short period of time
2
Duration of failure
Moderate or substantial period of time
3
Suitable to the situation in question
1
Adequacy of HMI and operational
Tolerable to the situation in question
2
support
Counter productive to the situation in
3
203
Chapter 7
Adequacy of alarms/alerts
Adequacy of organisation
Traffic complexity during the
recovery process
Airspace characteristics during
the recovery process
Weather conditions during the
recovery process
Conflicting issues in the situation
(task complexity)
question
External working environment matches the
controller's internal mental model
External working environment mismatches
the controller's internal mental model
Suitable to the situation in question
Tolerable to the situation in question
Counter productive to the situation in
question
Information from the external world enters
the processing loop at the right time
Information from the external world enters
the processing loop at the wrong time
(misleading sequence of alarms)
Efficient
Tolerable
Inefficient
Average traffic complexity
High or low traffic complexity
Adequate
Tolerable
Inappropriate
Improved
Deteriorated
Average complexity of the situation
Conflicting, multiple tasks or extremely low
complexity of the situation
1
3
1
2
3
1
3
1
2
3
2
3
1
2
3
2
3
2
3
In order to assure a complete list of relevant contextual factors, a key step at this stage
included verification of the selected RIFs. An initial verification was provided by two
ATM specialists (from one European ATC Centre) with extensive operational
experience. They had an opportunity to review the candidate RIFs, their definitions,
and related qualitative descriptors (for evidence see Appendix II) and their feedback
was valuable in the approval of selected RIFs. Further verification of the selected RIFs
has been conducted in the experiment (presented in Chapters 9 and 10). A discussion
on the process to quantify the probabilistic definition of 20 RIFs, their interactions, and
their influence on controller recovery is presented in more detail in the following
Chapter.
7.4 Summary
This Chapter has had the objective of defining recovery context via a set of contextual
factors, known as Recovery Influencing Factors or RIFs. The Chapter has built on the
review of existing HRA techniques and their corresponding contextual factors to identify
which factors are relevant to recovery from equipment failure in ATC. This initial
selection of relevant contextual factors has been augmented with specific equipment
204
Chapter 7
failure related factors and dynamic situational factors. The methodology resulted in a
set of 20 controller RIFs. The Chapter concludes with a definition of the qualitative
descriptors for each RIF or the levels of impact that each RIF has in the context of
controller recovery performance. All results obtained have been initially verified by two
ATM specialists who reviewed the choice of selected RIFs and their qualitative
descriptors. The selection of relevant contextual factors (i.e. RIFs) and their qualitative
descriptors are taken forward to the next Chapter to develop the methodology for the
quantitative assessment of the recovery context.
205
Chapter 8
206
Chapter 8
CREAM (Hollnagel, 1998) and Connectionism Assessment of Human Reliability CAHR (Straeter, 2000). A discussion of the CREAM techniques and its relevance to
this thesis is presented in sections 7.2.1.9 and 7.3 of Chapter 7 and will not be
repeated here. However, since the CREAM technique has been further developed in
the work by Kim, Seong, and Hollnagel (2005) and Fujita and Hollnagel (2004), both
approaches have been assessed for their relevance to the research presented in this
thesis.
207
Chapter 8
CREAM by
Hollnagel (1998)
Improvement of
CREAM by
Fujita and
Hollnagel (2004)
Improvement of
CREAM by Kim,
Seong, and
Hollnagel (2005)
CAHR by
Straeter (2000)
Relevant area
Theoretical
approach toward
human erroneous
action
Theoretical
approach toward
action failure rate
based on contextual
factors
Theoretical
approach toward
human erroneous
action
Data driven
approach defined
within nuclear
industry
Number of
contextual
factors
Interaction
between
contextual factors
Output
Nine
Included
qualitatively
Quantitative
probabilistic
range
Ten
Included
qualitatively
(based on
CREAM)
Quantitative
mean failure rate
Nine
Included
qualitatively
(based on
CREAM)
Quantitative,
probabilistic
approach
Included
quantitatively using
the available data
Connectionism
method
facilitating
qualitative and
quantitative
approach
Thirty
208
Chapter 8
for
quantitative
The proposed methodology is generic as its aim is to present the framework for a
generic ATC Centre, as described in Chapter 2, section 2.4. Used operationally, this
methodology would have to be refined to reflect and incorporate all the characteristics
of the ATC Centre or event under investigation.
In general this methodology consists of six steps (Figure 8-1). Firstly, it is necessary to
review the twenty RIFs identified in the previous Chapter and their relevance to the
ATC Centre or event under investigation. In the generic approach, all 20 factors are
assessed and defined through their qualitative descriptor or their levels of impact on
controller recovery performance (Step 1). Secondly, based on available sources of
information each RIF is probabilistically defined (Step 2). As a result, it is possible to
present the recovery context as a function of identified RIFs and their corresponding
levels. At this stage, there is no consideration of the interactions between RIFs, as they
are considered to be independent. To provide an accurate approach, Step 3 takes into
account all interactions between RIFs. These are assessed both qualitatively and
quantitatively. This results in a distribution of RIFs levels. Having a distribution of RIF
levels, as opposed to discrete Levels 1, 2 and 3, necessitates identification of the cutoff point between any two consecutive levels (Step 4). Once these cut-off points are
identified and RIF levels re-defined, the next step quantifies the relationship between
the particular level of RIF and its impact on controller recovery performance. This
relationship is expressed via correlation coefficients (Step 5). At this stage, previously
determined probabilities of each RIF level (Step 2) are re-calculated to account for
RIFs interactions. The result is the definition of an aggregated indicator of the recovery
context, referred to as the recovery context indicator Ic (Step 6).
The Figure 8-1 below presents the six steps framework of the quantitative assessment
of the recovery context. Since the previous Chapter identified and discussed all 20
RIFs and their levels of impact (qualitative descriptor), the following section discusses
the consequent step, namely probabilistic assessment of RIFs (Step 2). This is
followed by the remaining steps of the proposed methodology (Figure 8-1).
209
Chapter 8
Figure 8-1 Framework for the quantitative assessment of the recovery context
210
Chapter 8
211
Chapter 8
For this reason, it should be noted that this Chapter captures the characteristics of the
generic ATC Centre as a base for any further fine tuning of the proposed methodology
and its usage as either a retrospective or prospective/predictive tool. Each ATC Centre
has its unique characteristics that may be represented by different RIF probabilities.
For example, the number of workstations/sectors affected and complexity of failure
type depend on a particular architecture in each ATC Centre, while training for
recovery as well as adequacy of organisation depend on a particular safety culture.
The framework developed in this Chapter is applied to a unique ATC Centre, presented
in Chapter 10.
12
2
17
1 (RIF3)
3 (i.e. RIF3, RIF6, and RIF11)
212
Chapter 8
Centre system control and monitoring database (referred to as Country D). Detailed
analyses of these reports are presented in Chapter 4.
The analyses of operational failure reports are used to inform two particular RIF
probabilities. The first one is complexity of failure type. The probabilities relevant to
this RIF are determined by tracking the number of reports based on only single failure
compared to those reporting more than one failure. These findings are further validated
by the responses from the eight ATM specialists surveyed. The second RIF is duration
of failure. This RIF is informed by the analysis of data from Country D database, as it
was the only database that captured duration of failure. These findings are further
validated by the responses from the eight ATM specialists surveyed.
8.3.1.2 Questionnaire survey
The responses from the questionnaire survey, received from 34 different countries,
captured the experiences of more than one hundred air traffic controllers (average
controller experience is 13.8 years, ranging from 1 to 39 years). The detailed
assessment of this dataset is presented in Chapter 6. This source provided an input for
three RIF probabilities. These are: training for recovery from ATC equipment failure,
previous experience with a particular type of equipment failure, and existence of
recovery procedure.
The first RIF (training for recovery from ATC equipment failure) is more difficult to
determine compared to other two RIFs. The questionnaire survey determined that 51.7
percent of sampled ATC Centres have established training for recovery (informed
probability of RIF1 defined via Level 1) and that 31 percent have not (informed
probability of RIF1 defined via Level 3). The remaining 17.4 percent of sampled ATC
Centres showed inconsistent responses and this result is translated into the probability
of this RIF1 defined via Level 2 or tolerable level. It is assumed that inconsistent
responses on the existence of recovery training, within the same ATC Centre, may
suggest that training is not organised in a consistent manner.
8.3.1.3 Input by ATM specialists
Several probabilities are captured through the input from relevant ATM specialists from
eight similar ATC Centres. The ATM specialists from Ireland, Norway, Sweden, Austria,
New Zealand, Australia, and Japan participated in the small-scale survey. In two cases
the relevant probabilities are captured through face-to-face interviews (with ATM
specialists from Ireland and Norway), whilst in all other cases a predefined set of
213
Chapter 8
Source: personal correspondence with Dr Arnab Majumdar who visited all listed ATC Centres
Source: EUROCONTROL Performance Review Report (EUROCONTROL, 2006c)
3
Source: Airways New Zealand (2006b)
4
Source: Bureau of Transport and Regional Economics (2006). Australian Government
2
214
Chapter 8
Christchurch
Tokyo
ACC/Oceanic
ACC/Oceanic
Latest generation
Older generation
555
2,2505
The responses from the ATM specialists surveyed are used to inform 12 RIFs. For
three RIFs their responses have been used to either supplement the findings from the
past research (for the experience with the system performance RIF) or validate
findings from the operational failure reports (for the complexity of failure type and
duration of failure RIFs).
For majority of RIFs, the responses from the ATM specialists surveyed have been
consistent. However, for six RIFs some ATM specialist gave different answers. This
was the case with the following RIFs: personal factors, communication for recovery
within team/ATC Centre, time course of failure development, adequacy of HMI and
operational support, airspace characteristics, and conflicting issues in the situation
(task complexity). For example, for personal factors the majority of ATM specialists
reported this RIF as suitable for the recovery process in 70 to 90 percent of failure
occurrences. However, Oslo and Tokyo ATM specialists reported personal factors as
suitable in less then 15 percent of failure occurrences. These lesser ratings of the
personal factors indicate the perception of ATM specialists on readiness of air traffic
controllers to face unusual/emergency situations, such as equipment failure.
Similarly, potential gaps are identified with Melbourne and Christchurch ATC Centres
where the majority of failures seem to be latent (accounted for 92 and 60 percent,
respectively). This is contrary to the answers provided from other ATC Centres. Finally,
the potential gaps regarding the adequacy of airspace are identified by ATM
specialists from Auckland and Tokyo ATC Centres. They ranked airspace design and
configuration as tolerable, highlighting the potential for improvement of airspace
characteristics to enhance controller recovery performance.
It can be concluded that the ATM specialists from eight countries worldwide produced
similar ratings for the majority of RIFs. Identified inconsistencies reflect differences that
exist between these ATC Centres in terms of the ATC Centre culture (reflected in
personal factors), airspace design, and ATC Centre architecture. These differences are
reasonable as indicators of diversity that exists between ATC Centres within one
Source: Air Traffic Activity at Area Control Centre (last available for 2003) from Ministry of
Land, Infrastructure, and Transport (2006)
215
Chapter 8
country as well as worldwide. As a result, the responses from the ATM specialists
surveyed have been taken to inform several RIFs. In future, the weighting scheme may
be used to account for the variability between ATC Centres (e.g. safety culture,
differences of ATC Centres, ATM specialists experience).
8.3.1.4 Past literature
Finally, the relevant data from past ATC research are used to inform probabilities for
the RIF experience with the system performance. The probabilities are determined
from the findings of Hilburn and Flynn (2001) and EUROCONTROL (2000b) in which
18 percent of controllers reported undertrust in technology. These findings are
combined by the responses from the ATM specialists surveyed on the percentage of
controllers with an excessive trust in technology (i.e. overtrust). Therefore, both
sources of information are used to establish the final probability rating for this particular
RIF (presented in Appendix VIII).
8.3.1.5 Aggregation of data
The previous sections have described four different sources of information used to
determine RIF probabilities. These are: operational failure reports, responses from a
questionnaire survey, responses from the ATM specialists surveyed, and past literature.
Table 8-4 reviews all four sources of information with respect to the level of confidence
and therefore the rationale behind the aggregation of data. Three data sources are
rated with a high level of confidence (questionnaire survey, responses from the ATM
specialists surveyed, and past literature). Only one source is rated with medium
confidence. More precisely, the confidence level for operational failure reports from the
CAA databases is not defined as high due to the lack of information on the reliability of
available reporting schemes. There are reliability issues regarding the reporting of
safety occurrences recognised by CAAs 6 . However, none of the CAAs has a
methodology in place to assess the reliability of their reporting scheme, and therefore,
the completeness of the occurrence databases. Therefore, the medium ranking for the
confidence level is an assumption informed by operational experience. As a result, the
data from this source are validated by the findings from another source of data (i.e.
ATM specialists input) to assure reliable RIF ratings.
216
Chapter 8
Table 8-4 Overview of the sources of information used to determine RIF probabilities
Source
Level of confidence
(subjective)
Operational failure
reports from the CAAs
Medium
Operational failure
reports from the
engineering unit of
particular ANSP
High
Questionnaire survey
High
ATM specialists
High
Past literature
High
Comment
The confidence level is not defined as high
due to the lack of information on reliability of
available reporting schemes
The confidence level is defined as high due
to the fact that the engineering unit has to be
aware of all equipment failures occurring in
the ATC Centre as they are directly
responsible for their maintenance and repair
Responses from 134 air traffic controllers,
from 58 ATC Centres, and 34 countries
worldwide
Conducted with ATC specialists from eight
ATC Centres worldwide
Hilburn and Flynn (2001) and
EUROCONTROL (2000b)
In general, the above analyses employed the data from all four sources to define the
probabilities for 20 Recovery Influencing Factors (RIFs). These are presented in
Appendix VIII.
8.3.2 Summary
The preceding paragraphs have used the qualitative levels of the impact of each of the
RIFs (i.e. qualitative descriptor) defined in Chapter 7 and probabilistically defined each.
Overview of all 20 RIFs, their corresponding levels, and designated probabilities is
provided in detail in Appendix VIII and in a tabular form in Appendix X.
Having defined all 20 relevant recovery factors in the previous sections, it is possible to
define recovery context. In general the recovery context may be seen as a discrete
function since all possible contexts are defined exactly by 20 elements, and since each
RIF has only two or three defined levels. In mathematical terms, the existing method
can be expressed as a function f using a set of 20 RIFs to define the recovery context
indicator (Ic) as shown in equation 8-1:
8-1
The total number of possible recovery contexts represents the number of combinations
of the 20 RIFs, where nine of them have three levels whilst eleven have only two levels
of impact. In total, this approach generates 39 x 211 = 40,310,784 possible contexts,
each having equal probability of occurrence of 1/40,310,784 = 2.4E-08. In
mathematical terms this is equivalent to finding all variation with repetitions of 20 RIFs
217
Chapter 8
and their corresponding levels. In addition, each recovery context will have a specific
value of the recovery context indicator (Ic). The methodology to calculate this variable
is presented in the remainder of this Chapter.
Table 8-5 presents an example of a potential recovery context as a 20-digit array
where each digit corresponds by its position to a particular RIF and by its value to the
precise impact of a particular RIF on controller performance. At this stage, all RIFs are
considered independently and their corresponding levels of influence on controller
performance take integer value, i.e. 1, 2, or 3.
RIF10
1
RIF20
3
The following sections show how the existing RIFs interactions may change the RIF
levels in either direction (i.e. increase the value of the level which corresponds to the
deterioration in controller performance or decrease the value of the level which
corresponds to an improvement in controller performance).
218
Chapter 8
traffic and airspace complexity. If a controller deals with increased levels of traffic, it is
reasonable to assume that stress levels will be higher.
In order to determine the effect of contextual factors on controller performance it is
therefore necessary to describe these interactions, in addition to describing how they
affect controller performance. The analysis of interactions makes it possible to gain a
more accurate picture of the context and thus a better understanding of the recovery
process. In other words, this permits a broader retrospective analysis as well as a more
precise prediction of the effectiveness of the improvement measures. As noted by
Straeter (2000), such interactions could also point to additional factors previously
omitted, such as potential organisational shortcomings.
Straeter (2000) tackles this problem in CAHR by looking at the common appearance of
different factors (using available databases). The analysis is based on capturing the
observed interactions between reported contextual factors. The availability of a detailed
database is however a prerequisite to this approach. Hollnagel (1998) on the other
hand establishes these interactions in CREAM by considering each contextual
condition with respect to how it generally influences the others (there is no mention
whether expert judgement or operational expertise have been used). It is also
important to say that CREAM assumes reciprocal interaction between the contextual
conditions.
The interactions amongst predefined 20 RIFs have been determined based on known
relationships from operational experience and marked with symbol in Table 8-6.
They represent the irreversible influence between two RIFs or how RIFs in the first row
affect RIFs in the left hand column. The reason for irreversible influence lies in the
characteristics of the air traffic environment where one factor may influence the other
one without any reverse effect. For example, complex traffic can influence controller
personal capabilities in terms of increased stress, anxiety, and workload; while the
opposite influence (impact of personal capabilities on traffic complexity in the sector) is
simply not logical.
219
Chapter 8
3
4
6
7
8
10
11
Personal factors
(a)
(a)
(a)
(c/
a)
(c/
a)
(c/
a)
Comm. for
recovery within a
team of
controllers
Complexity of
failure type
Time course of
failure develop.
Number of
workstations/
sectors affected
Time necessary
to recover
20
15
Adequacy of alarms/alerts
onset
19
14
Adequacy of alarms/alerts
18
13
Ambiguity of info in the working
environment
Airspace characteristics
12
Adequacy of HMI
17
11
Duration of failure
16
10
Existence of recovery
procedure
Adequacy of organisation
9
Time necessary to recover
(h/
a)
(h/
a)
(h/
a)
(h/
a)
(a)
(h/
a)
(h)
(h/
a)
(h/
a)
(h/
a)
(h/
a)
(h/
a)
(c/
a)
(a)
(x)
(a)
(h/
a)
(h/
a)
(h/
a)
(h/
a)
(c/
h/
a)
Traffic
6
Complexity of failure
5
Comm. for recovery within a
team of controllers
(a)
Task complexity
Training for
recovery from
ATC equipment
failures
Previous
experience with
equip. failures
Experience with
system perf.
(reliance)
Weather conditions
Personal factors
Direct Influence
RIF
ID
Table 8-6 Interactions matrix: (c) validation by CREAM, (h) validation by CAHR, (a) validation by
ATM specialists; and (x) not validated interactions
(c/
a)
(a)
(a)
(a)
(a)
(a)
(x)
(h/
a)
(h)
(x)
(a)
(x)
(h/
a)
(h)
(x)
(h/
a)
(h/
a)
(h)
(h/
a)
(a)
(a)
(a)
(h/
a)
(h/
a)
(h/
a)
(c/
h/
a)
(c/
h/
a)
Existence of
recovery
procedure
Duration of
failure
(a)
(a)
(a)
(c/
a)
(a)
(a)
(c/
h/
a)
(c/
h/
a)
Adequacy of
HMI
13
Ambiguity of info
in the working
environment
14
Adequacy of
alarms/alerts
(a)
15
Adequacy of
alarms/alerts
onset
(a)
16
Adequacy of org.
17
Traffic
18
Airspace char.
19
Weather
20
Task complexity
(a)
(a)
(c/
h/
a)
(c/
a)
(a)
(a)
12
(c/
h)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(c/
a)
(c/
a)
(c/
a)
(c/
a)
(c/
a)
(a)
(a)
(h/
a)
(h/
a)
(h/
a)
(a)
(a)
(a)
(h/
a)
220
(c/
h/
a)
(a)
(c/
h/
a)
(c/
h/
a)
(c/
a)
(c/
h/
a)
(a)
(a)
(a)
(a)
(a)
(a)
(x)
(a)
(a)
Chapter 8
221
Chapter 8
MMS
Person
Task
System
Task
Task
System
Order-issue
Feedback
Task/activity
Several identified PSFs are relevant to the nuclear plants (e.g. task preparation,
precision, labelling, marking), whilst the majority are applicable to recovery from
equipment failures in ATC (e.g. time pressure, procedures, HMI). Straeter
(2000)
222
Chapter 8
k
x
xy
R x =RIFY j + k xy
8-2
where,
RIFYj
RIFYj
kxy
Rx
In other words, kxy is the numerical representation of the direct influence that RIFX has
on RIFY. Note that the interaction factor represents irreversible interaction (i.e. kxy kyx).
Taking into account the overall lack of quantitative assessment of context in the area of
223
Chapter 8
224
Chapter 8
occurrence of any context, with or without incorporation of RIF interactions, is the same
(1/40,310,784=2.4E-08 as previously reported in section 8.3.2).
Table 8-8 Recovery context (as presented in Table 8-5) after the incorporation of RIF
interactions
RIF ID
Level
RIF ID
Level
RIF1
1.00
RIF11
1.95
RIF2
.95
RIF12
2.00
RIF3
1.95
RIF13
0.89
RIF4
.84
RIF14
1.05
RIF5
.89
RIF15
2.95
RIF6
2.05
RIF16
2.89
RIF7
1.05
RIF17
2.95
RIF8
2.05
RIF18
1.11
RIF9
.74
RIF19
3.00
RIF10
1.05
RIF20
2.74
In short, a change (increase or decrease) in the value of a particular RIF represents the
final outcome of all possible interactions with that particular RIF. For example, RIF5
level changes from value 1 to value 0.89 as a results of the influence of 15 different
RIFs, as seen from the matrix in Table 8-6 (see row 5).
In this particular example, RIF1, RIF2, RIF4, RIF9, RIF10, RIF13, and RIF14 influence
RIF5 in a positive way as they are defined via Level 1. As a result, each of these seven
RIFs decreases the RIF5 level by -1/19=-0.053. However, RIF15, RIF16, RIF17, RIF19,
and RIF20 influence RIF5 in a negative way as they are defined via Level 3. As a result,
each of these five RIFs increases the RIF5 level by +0.053. Other RIFs, namely RIF3,
RIF6, and RIF12 do not have any influence on RIF5 as their level is 2, which assumes
no significant influence on human performance. Furthermore, RIF7, RIF8, RIF11, and
RIF18 have no impact on RIF5 and therefore are not considerate. The result of this is
an overall decrease in RIF5 level as follows (equation 8-3):
8-3
The incorporation of all identified RIF interactions applied to all the identified recovery
contexts (all 40,310,784 of them) made it possible to identify the distribution of all RIFs.
Prior to incorporation of RIF interactions, the distribution of each level is the same. For
example, Figure 8-2 represents the distribution of RIF5 without incorporation of RIF
interactions. This graph represents three levels of RIF5 in a symmetrical manner, each
accounting for exactly 13,436,928 contexts or one third of the total (Figure 8-2). This
results in equal representation of each level in the 40,310,784 possible recovery
contexts.
225
Chapter 8
16000000
14000000
Frequency
12000000
10000000
8000000
6000000
4000000
2000000
3.
9
3.
6
3.
3
2.
7
2.
4
2.
1
1.
8
1.
5
1.
2
0.
9
0.
6
0.
3
Level
Figure 8-2 Distribution of RIF5 levels amongst identified recovery contexts without interactions
However, due to the identified interactions, the distribution of RIF5 levels amongst all
possible recovery contexts takes a different, more dispersed, shape (Figure 8-3). It is
notable that the more interactions exists with a particular RIF, the more dispersed the
distribution of levels will be. The example utilised in this section (i.e. RIF5) has a
substantial number of other contextual factors that affect it, namely 15. However, in
some cases the number of identified interactions can be small (e.g. one or two) while in
the case of RIF19 (weather conditions) there are no identified interactions and thus this
RIF has a similar distribution to RIF5 (Figure 8-2). In any case, the total number of
recovery contexts where RIF5 (or any other RIF) is defined via Level 1 remains the
same whether RIF interactions are incorporated or not. The distribution of the levels for
each of the 20 RIFs is presented in Appendix XII in a tabular format.
4000000
3500000
Frequency
3000000
2500000
2000000
1500000
1000000
500000
2.
3
2.
5
2.
7
2.
9
3.
1
3.
3
3.
5
3.
7
3.
9
0.
5
0.
7
0.
9
1.
1
1.
3
1.
5
1.
7
1.
9
2.
1
0.
1
0.
3
Level
Figure 8-3 Distribution of RIF5 levels amongst identified recovery contexts with interactions
Once the RIF interactions have been identified and their impact quantitatively
determined, the next step is to re-calculate existing RIF probabilities to more accurately
reflect newly determined RIF levels. However, to achieve this step it is necessary to
226
Chapter 8
determine the cut-off points between any two consecutive levels of influence, i.e. to
determine the precise boundaries between Level 1, Level 2, and Level 3. Another
option would be to consider each of the distributions separately, i.e. covering the entire
spectrum (-, +). In this way, there is no cut-off point and there is coherency between
all results as well. However, both approaches yield similar results as there is very little
overlap between these distributions. The following section explains the method applied
to determine the cut-off points between any two consecutive RIF levels.
Frequency
8000000
7000000
6000000
5000000
4000000
3000000
2000000
1000000
3.
4
3.
7
3.
1
2.
8
2.
5
1.
9
2.
2
1.
6
1.
3
0.
7
0.
4
0.
1
Level
Figure 8-4 Distribution of RIF1 levels amongst identified recovery contexts with interactions
6000000
Frequency
5000000
4000000
3000000
2000000
1000000
3.
7
3.
4
3.
1
2.
8
2.
5
2.
2
1.
9
1.
6
1.
3
0.
7
0.
4
0.
1
Level
Figure 8-5 Distribution of RIF20 levels amongst identified recovery contexts with interactions
227
Chapter 8
The statistical method for determining the cut-off points between the levels for each
RIF is based on the 95 percent confidence interval for each level. For example, a 95
percent confidence interval for Level 1 of RIF1 would cover 95 percent of the normal
curve, where the probability of observing a value of Level 1 RIF1 outside of this area
would be less than 0.05. Under the assumption of a normal distribution7, the interval
range ( - 2, + 2) captures approximately 95 percent of data.
n =1
( X
Xn
n =1
)2
, where
8-4
represents population mean for RIF1 Level 1 (population of all possible recovery
contexts where RIF1 is defined through Level 1);
represents the total number of recovery contexts in which RIF1 is defined via Level 1;
Xn
represents the n-th value of the variable RIF1 Level 1 (n=1,2, . , 40,310,784).
To overcome this, three different interval values or three different cut-off points
(assumed based upon the initial distribution of data) are tested. For example, when
assessing the cut-off points between levels of RIF5, three different values between
Level 1 and Level 2 have been tested (namely Fit 1, Fit 2, and Fit 3 in Figure 8-6).
Corresponds to the symmetrical distribution of levels around the values of 1, 2 and 3, but also
to the large number of observations.
228
Chapter 8
Figure 8-6 Distribution fitting for the three cut-off points on the example of RIF5 Level 1
Table 8-9 Descriptive statistics for the three cut-off points on the example of RIF5 Level 1
Cut-off point
Standard
Standard error on
RIF5 Level 1
Mean
used
deviation
the mean
Fit 1
1.6
1.18
0.17
4.59E-05
Fit 2
1.7
1.18
0.17
4.65E-05
Fit 3
1.8
1.19
0.19
5.11E-05
Probability density function approach represents distributions so that the sum of the areas of
the rectangles equals 1.
229
Chapter 8
data. The calculation of the function minimum9 shows that regardless of the type of
polynomial function, the local minimum corresponds to the cut-off point at 1.7 (Table 810). The fit of a cubic polynomial function to RIF5 Level 1 data is presented in Figure 87. Since Table 8-9 shows that the choice of cut-off at 1.6 and 1.7 constitute no
significant difference, and since the function minimum is closer to the value of 1.7, this
value is taken forward as a cut-off point between RIF 5 Level 1 and Level 2.
Local minimum
1.7016
1.6653
1.6756
4000000
f(x)= 1E07(-0.5613x 3 + 4.2097x 2 - 9.3510x + 6.5076)
3500000
3000000
Frequency
2500000
2000000
1500000
1000000
500000
0
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.1
2.2
Level
Figure 8-7 Cubic polynomial function f(x) fitted for the RIF5 data to determine its minimum
Similarly, the value of 2.7 is taken as a cut-off point between Level 2 and Level 3 (see
Table 8-11). Using the same methodology, the cut-off points are determined for all
RIFs and their corresponding levels. The established values are reported in Table 8-11.
Table 8-11 Cut-off points between the levels for all RIFs
Cut-off point between Level 1 and
RIF ID
Level 2
1
1.5
2
1.5
3
N/A
9
In the case of quadric polynomial functions, it is necessary to specify the local minimum (this
polynomial function has three first derivatives and thus potentially two minimums).
230
Chapter 8
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1.7
1.7
N/A
1.5
N/A
2.7
2.7
2.5
2.5
2.5
2.2
1.5
N/A
1.5
2.5
2.5
2.5
2.0
1.5
2.5
2.0
1.5
N/A
1.6
N/A
N/A
2.5
2.5
2.6
2.5
2.7
231
Chapter 8
Table 8-12 Probabilities for the RIF5 and each of its levels (see Appendix X)
RIF5: Communication for recovery within
team/ATC Centre
Efficient
Tolerable
Inefficient
Level
p(L)
1
2
3
0.73
0.24
0.04
The way to approach this problem is firstly to determine all recovery contexts for which
RIF5 is represented via Level 1. In other words, it is necessary to determine the
number of recovery contexts for which the RIF5 level is smaller or equal to the cut-off
point between Levels 1 and 2 (i.e. 1.7, Table 8-11). This is presented in equation 8-5
below:
RIFX1 =
RIFX
j'
j'
C j , j +1
C1,2
RIFX1 =
RIFX j ' =
RIFX j ' , 0 < j ' C j , j +1
1 ,
j ' =0
j ' =0
C j , j +1
C2,3
j = 2 , RIFX2 =
RIFX j ' =
RIFX j ' , C j 1, j < j ' C j , j +1
j ' =C j 1, j
j ' =C1,2
4
4
3 , RIFX3 =
RIFX j ' =
RIFX j ' , C j 1, j < j ' 4.0
j '=C j 1, j
j ' =C2,3
232
8-5
Chapter 8
where
X
Cj j+1
f (RIFX j ' )
f (RIFX j )
f (RIF 5 0.89 )
1,,008,576
p(RIF 5 0.89 ) = 0.73
= 0.73
= 0.055
f (RIF 51 )
13,476,924
8-6
where
X
represents levels 1, 2, or 3;
p (RIF5 j)
p (RIF5 j)
f (RIF5 j)
f (RIF5 j)
represents the sum of all levels that correspond to the RIF5 Level 1
(i.e. 0.0 < j 1.7).
The new probability of occurrence (0.055) is low in its magnitude, but represents an
occurrence which a high probability of recovery. In other words, in this particular
context, RIF5 is enhanced by the influence of all the other RIFs that have interaction
with it. The final output of this methodology is the indicator of a specific recovery
context (Ic), as presented in equation 8-7. The characteristics of Ic are that, for
example, in the case of all 20 RIFs defined via Level 1 with the probability 1 and no
233
Chapter 8
interactions, the value of Ic equals 1. Similarly, in the case of all 20 RIFs defined via
Level 3 with the probability 1 and no interactions, the value of Ic equals -1.
20
i =1
20
p(RIFX j ' ) R j
+
j =1
3levelsRIFs i =1
N
3
Ic =
p(RIFX
j' ) R j
j =1
2levelsRIFs
8-7
, where
p(RIFX j)
probability of RIFX with level j, where X=1, 2, 3, , 20 and 0.0 j 4.0. The
level j takes into account all interactions between RIFs;
Rj
p(RIFX j) x Rj
234
Chapter 8
600000
Frequency
500000
400000
300000
200000
100000
-0
.0
-0 7
.0
5
-0 9
.0
4
-0 8
.0
3
-0 7
.0
2
-0 6
.0
1
-0 5
.0
04
0.
00
7
0.
01
8
0.
02
9
0.
04
0.
05
1
0.
06
2
0.
07
3
0.
08
4
0.
09
5
0.
10
6
0.
11
7
0.
12
8
This distribution is slightly positively skewed (right-skewed) since it has a longer tail in
the positive direction relative to the other tail. This is also confirmed by the positive
value of the statistical test indicating the concentration of values on the left side of the
distribution. The median value or value on the horizontal axis which has exactly 50
percent of the data on each side is -0.023. This positive skew may result from initial
inputs into the methodology for the quantitative (probabilistic) assessment of the
recovery context surrounding equipment failure in ATC. For example, observing the
probability values for each RIF and its corresponding levels it is clear that 12 out of 20
RIFs have a higher probability of enhancing recovery performance as opposed to
having no impact or negative impact. In other words, the probabilities of Level 1 for
these 12 RIFs are higher than for other level(s) (i.e. Level 2 and Level 3, see Appendix
X for details on RIFs probabilities). Therefore, it can be concluded that the framework
for a calculation of the recovery context in the generic ATC Centre takes the value of
the recovery context indicator close to 0.027. This indicates that there is a large
potential for improvement and shift of the Ic values more towards a positive side, thus
enabling more appropriate contextual conditions.
In order to fully comprehend the characteristics of Ic, the next step is to calculate the
extreme values of Ic, from the most negative towards the most positive value of Ic. In
other words, it is necessary to determine the ideal recovery context where all RIFs can
235
Chapter 8
be expressed via Level 110. Similarly, it is necessary to determine the worst possible
recovery context where all RIFs can be expressed via Level 311. In these cases, when
there is no uncertainty related to the probabilities of each RIFs level, it is possible to
represent the most negative and the most positive recovery context.
Hence, the most negative value of Ic calculated using equations 8-6 and 8-7 takes the
value of -0.95. This value represents the worse possible recovery context that can
facilitate controller recovery performance in the generic ATC Centre. Similarly, the
most positive value of Ic calculated using the same equations is 0.65. These two
values are numerical representations of two extreme recovery contexts which are
mutually exclusive. However, these extreme values may be used as a good indicator of
the scale of changes that are possible to achieve within the ATC environment.
10
RIF3, RIF6, RIF8, RIF11, RIF17, RIF19, and RIF20 do not have the possibility of Level 1 and
thus these will take the next most desirable level, being Level 2.
11
RIF2 does not have the possibility of Level 3 and thus it will take the next most undesirable
level, being Level 2.
236
Chapter 8
(-0.031, 0.085)
With suitable training for the situation in question (e.g. a particular failure type) there is
no significant difference between the sample and baseline means but it is observable
that the value of Ic shifts toward a more positive value. Therefore, a second sample
was taken, assuming additionally that RIF2 or experience with equipment failure
matches precisely the equipment failure in question. In other words, RIF2 can be
defined exactly via Level 1 and its corresponding probability (p=1). The result of this
analysis shows that there is a significant change in the recovery context, since the
obtained mean does not fit the 95 percent confidence interval determined for the
baseline. Therefore, the enhanced recovery context (sample 2) comes from a
population different from the baseline recovery context. This finding indicates that the
value of Ic is sensitive to changes in the individual RIFs.
237
Chapter 8
desirable Level 2 (average) or Level 1 (most favourable) and an overall shift in the
recovery context indicator (Ic) towards more positive values (e.g. extreme positive
value). The cost should be defined through the inherent costs linked to the proposed
recommendation and therefore, should include actual rather than generic costs of the
proposed change within the specific ATC Centre. Thus the cost may include the
following:
costs of technical changes, followed by any other operational costs (delay in the
use of new system due to necessary maintenance, staff training);
costs of designing a new procedure, followed by the cost of training the staff (i.e.
time and resources);
organised
exchange
of
past
experience
on
non-nominal
events,
8.8 Summary
This Chapter has presented a methodology for the quantitative assessment of recovery
context. It started by reviewing the past HRA research of relevance to the quantitative
analysis of contextual factors. This has resulted in the selection of the CREAM
technique and its application by Kim, Seong, and Hollnagel (2005) for further
development. Building on this, a novel methodology has been developed for the
research presented in this thesis. This method assessed controller recovery
238
Chapter 8
239
Chapter 9
Experimental Investigation
After the review of the methodology for the quantitative assessment of the recovery
context in the previous Chapter, this Chapter describes an experiment designed to
further validate the proposed methodology and capture the controller recovery
performance. This Chapter begins with a high-level design for the process adopted for
the experiment. This is followed by the rationale behind the need for the experiment
defined through several objectives. In order to achieve these objectives, this Chapter
describes the overall design of the experiment and selection of potential equipment
failures initially tested in a pilot study. It continues by providing the key requirements for
the experiment of relevance to this thesis, measured variables, and experimental
procedure.
Both the pilot and the main experiment were conducted in close collaboration with one
European Civil Aviation Authority (CAA)1. This particular CAA provided all of the
necessary infrastructure and staff from two ATC Centres during the period of the
experiment in 2005 and 2006. One ATC Centre was used for the pilot study which
tested the feasibility of the experimental design and its overall methodology. The other
ATC Centre was used on three separate occasions to simulate a selected unexpected
equipment failure in order to capture data on the recovery performance of 30 licensed
air traffic controllers. The Chapter concludes with a discussion of measured variables
used to capture the characteristics of controller recovery in ATC. The data collected is
subjected to a rigorous analysis in Chapter 10.
This CAA performs the function of Air Navigational Service Provider (ANSP) and the term CAA
will be used to denote also ANSP in the remainder of this thesis.
240
Chapter 9
Experimental Investigation
Assessment of
the available
resources
Design of the
experiment
In case of
necessary
changes
Selection of the
equipment failure
Pilot study
Revision of the
pilot study
Main
experimental
study
Data processing
and analysis
241
Chapter 9
Experimental Investigation
242
Chapter 9
Experimental Investigation
Although the requirements for an experimental plan were ready at the initial stage of
the research, it took two years to gain access to the required facilities. After
considerable negotiations with all potential locations, only one CAA responded
positively and agreed to provide both simulation facilities and staff for this experiment.
Both the pilot and the main study were conducted using their facilities, assistance, and
manpower.
Some research was done in the UK National Air Traffic Services (NATS), but was not released
for public use.
243
Chapter 9
Experimental Investigation
preceded by a review of relevant ATC topics in order to prepare efficiently for practical
work on the simulator. The relevant areas covered were ATC phraseology, operational
procedures, equipment, radar vectoring, speed control, level busts, and aircraft
performance.
Table 9-1 Training, pilot study, and experiment sessions
Date
19-20 Feb
2005
26-27 Feb
2005
02 Nov
2005
29 Nov
01 Dec
2005
27 Feb
02 Mar
2006
06 Jun
09 Jun
2006
Phase
Objective
Comment
Phase I
Pilot study
Phase II
Main study I
Total of eleven
controllers participated
Main study II
Most of the failures in the ATC environment are prevented or handled at the
technical/engineering level. Only a few failures manage to penetrate multiple redundancies and
fail-safe system design and affect controller performance.
244
Chapter 9
Experimental Investigation
The experiments were to be conducted during morning and afternoon sessions with an
assurance that participants are tested in equal proportion during the two sessions. The
simulation room conditions (lighting, temperature, noise) were to be consistent for all
runs.
Each simulation run was planned to last approximately 30 minutes, followed by a
debriefing session of similar duration. The instant of the injection of equipment failure
was planned to be precisely determined during the pilot study, occurring between the
5th and 15th minute of each run. The equipment failure would last 15 minutes. This was
decided based on two factors. Firstly, operational data shows that the majority of
failures last up to 15 minutes (Chapter 4 section 4.4.6). This has been confirmed by the
questionnaire survey results (presented in Appendix VI). Secondly, the 15 minute
duration of failure represents enough time to observe, capture, and assess the
controller reactions, performance, and overall recovery strategy.
The selection of the equipment failure to be simulated in the pilot study was based on
the results of the analysis of operational failure reports, the qualitative equipment
failure impact assessment tool, and the results of the questionnaire survey. However,
this selection was constrained by the technical capabilities of the available simulation
platform. In other words, it was important to simulate failure as well as the restoration of
the relevant equipment. Thus, the simulator platform would have to provide this
particular capability for a selected failure type. The final decision on the equipment
failure to be simulated would be achieved after testing candidate failure types during
the pilot study. The detailed rationale behind the selection of potential equipment
failures for the pilot and main experiment is given in the following section.
Another important factor of the experiment was the involvement of a Subject Matter
Expert (SME). The role of the SME would be to act as an observer and the coordinator
of the operations room. Upon a request from a controller, the SME would be
responsible for issuing any relevant information about the failure and its effect on the
ATC Centre (as would be required in the operational environment upon receiving an
update from the system control and monitoring unit). Upon restoration of the
equipment, there are several steps that controllers must perform to assure equipment
reliability and hence its readiness for the restoration of normal service (i.e. postrestoration steps). Therefore, additional time would be given to controllers in the postrestoration part of the simulation run, from the 25th to the 30th minute of each run. This
245
Chapter 9
Experimental Investigation
the
communication,
surveillance,
and
data
processing
functionalities.
Furthermore, the availability of the duration variable in one of the datasets (Country
D), enabled identification of equipment failures lasting up to 15min, which is the failure
duration feasible within this experimental set up. Failures with a major impact on ATC
operations lasting for a period of up to 15 minutes include: data exchange network,
246
Chapter 9
Experimental Investigation
other surveillance systems (predominantly radar link), the flight data processing
system, and air situational display (see Table 9-2).
Table 9-2 Overview of the potential equipment failures to be simulated and their inclusion in the
pilot study
Qualitative
equipment Adequacy
Potential
failure
for the
Testing in the
Source
equipment failures
Comment
impact
pilot
pilot study
to simulate
assessment
study
tool rating
It can range from
moderate to minor
Data exchange
Secondary
and the selection
No
network
functionality
tries to focus on
Operational
major failures
failure reports
Other surveillance
Secondary
(selection
systems (e.g. radar
No
functionality
focused on
link)
major failures
Flight data
Primary
Reduced flight
of short
Yes
processing system functionality
plan mode
duration)
Not interesting
Air situational
Primary
enough from the
Yes
display
functionality
controller recovery
perspective
Aircraft radio
Air-ground
Primary
Yes
communication
communication
functionality
failure
Not possible to
simulate failure of
Primary
Primary
Yes
one radar, but only
surveillance radar
functionality
the complete loss of
radar coverage
Flight data
Primary
Reduced flight
Yes
processing system functionality
plan mode
Not interesting
enough from the
Questionnaire
controller recovery
survey
Communication
Primary
No
perspective as the
panel
functionality
controller would
simply change the
position
Not interesting
enough from the
controller recovery
Ground-ground
Primary
perspective as the
No
communication
functionality
controller would try
to establish
communication via
other means
247
Chapter 9
Experimental Investigation
Having these nine possible failure types identified, it was necessary to select candidate
failure types for a final assessment in the pilot study in order to determine the failure to
be simulated in the main experiment. The rationale for this selection was based on the
severity of the failures as determined using the qualitative equipment failure impact
assessment tool (Chapter 4, section 4.5). The development of this tool was based
around the fact that not all equipment failures have the same severity of impact on ATC
operations. This tool identified the failures with the largest impact on ATC operations.
These are failures of the primary ATC functionality, which affect multiple
systems/tools/equipment either suddenly or gradually up to one hour in duration (see
Figure 4-9 and Table 9-2).
The process above, based on operational failure reports, the questionnaire survey, and
the qualitative equipment failure impact assessment tool, identified four potential failure
types. These are the failure of the flight data processing system, air situational display,
air-ground communication, and primary surveillance radar. These four candidate failure
types are further scoped by assessing their significance from the controller recovery
perspective but also their technical feasibility. In other words, the focus was on the
failures which require controllers to recover using only the systems available at their
positions. As a result, the pilot study simulated two different equipment failures. These
were a reduced flight plan mode as a part of the flight data and processing system and
air-ground radio communication failure.
Both failure types also conform to the requirements described in Chapter 5 (section
5.7.3) that the simulated equipment failure should allow one part of the diagnosis
phase of controller recovery to be performed overtly and thus be captured via
observations. For example, the flight data and processing system failure may be
initially thought as aircraft transponder or secondary surveillance radar failure.
Similarly, air-ground communication failure manifests itself in the same manner
regardless of its cause (i.e. ground- vs. airborne-based failure). In both cases, it is up to
the controller to identify the true failure by ruling out alternatives (e.g. communication
with pilot or adjacent ATC Centre) and this diagnostic process can be captured via
observations.
248
Chapter 9
Experimental Investigation
Effect
Monitoring aid available only
for flight plan tracks already
displayed
Flight data functions not
available
Inability of the controller to
contact aircraft on the
dedicated frequency as well as
emergency frequency.
249
Existence
of recovery
procedure
No
General Information
Window/Flight Data
Processing (FDP) label
changes from white to
yellow
No (not in
the ATC
Centre)
None
Chapter 9
Experimental Investigation
Several important conclusions were drawn from this pilot study and the lessons learnt
were used to enhance the main experimental design. These are as follows:
Integration of a research experiment into any kind of on-going ATC training requires
significant collaboration with training instructors, the engineer in charge, and an
ATM specialist (SME). In spite of thorough preparation, the injection of failure in the
first simulator run did not occur at the required instant due to the unclear
instructions given to pseudo pilots. This issue was corrected in the subsequent
runs. Therefore, for the main experiment a complete understanding of the set up of
the experiment would have to be ensured between the training instructor, engineer
in charge, pseudo pilots, and the SME in order to avoid any misunderstanding. This
should involve detailed discussions prior to the first simulation run of the day.
The initial intention was to inject an equipment failure in the 25th minute of the
simulation run, in order to give the controller adequate time to adjust to the traffic
scenario. However, the first run showed that this timing was inappropriate for two
reasons. Firstly, the controllers were all very experienced and thus did not require
the proposed length of time to adjust to the traffic scenarios. Secondly, the traffic
scenarios used had a low number of aircraft in the dedicated sector from the 25th
minute onwards. This was contrary to the plan to inject an equipment failure during
the periods of average to high traffic density. Both problems were corrected by
injecting a failure in the 10th minute of the simulation run and observing the
controller recovery process while traffic increased progressively during the 30
minute runs. Since the main experiment was to use fully licensed and experienced
controllers, the exact moment of failure injection would have to be based on the
number of aircraft in the sector. The aim would be to initiate failure with traffic levels
starting with average and then progressing towards high.
The need for access to the simulator log files was identified for the purpose of
capturing all of the inputs of the controller on the keyboard and HMI. The main
purpose for these log files would be to extract the precise reaction time of the
controller following detection of the equipment failure. However, difficulties were
encountered in the acquisition and decoding of these log files. Log files from
simulation platforms tend to have a specific format and level of detail too
cumbersome to decipher. In addition, initial detection may not necessarily be
captured in these log files (as an actual action). This is because controllers may
detect the failures but not take any action until they have evaluated the impact of
the failure on the operation. Having considered all the advantages and
disadvantages of using log files, it was decided to omit them. An alternative was
250
Chapter 9
Experimental Investigation
251
Chapter 9
Experimental Investigation
did not have advance knowledge of the nature of that unusual occurrence, i.e. ATC
equipment failure.
Because of the great amount of data and observations to be collected, it was
realised that the main experiment would require an assistant. The primary task of
the assistant would be to observe and take notes/recordings of the controllers overt
behaviour and attitude.
Finally, although the simulation runs in the pilot study were designed to reflect high
traffic levels, failures were injected during a period of average to low traffic.
Additionally, no adverse weather was simulated, which would add to the complexity
of the exercise. As a result, the traffic scenario in the main experiment would
necessitate high traffic levels from the moment of failure injection throughout the
duration of the exercise. Additionally, adverse weather could be simulated resulting
in the unplanned rerouting of air traffic.
252
Chapter 9
Experimental Investigation
The following section discusses the process adapted to set up the actual experiment
including a description of the characteristics of the simulated airspace, traffic, and
equipment failure type.
The SME participating in this study is an ATM Specialist with 20 years of experience in many
facets of ATC and has 15 years of experience as an ATC instructor.
253
Chapter 9
Experimental Investigation
The recovery process did not end with the restoration of the equipment (the 25th
minute) due to several steps that the controller had to perform to assure equipment
reliability and hence the readiness for the restoration of normal service. It usually took
one minute to accomplish these post-restoration steps. Additional time was given to
controllers in the post-restoration part of the simulation run (from the 25th to the 30th
minute of the run) to restore their normal working strategy and to calm down after the
effects of a highly stressful equipment failure occurrence.
The SME involved in the study as an observer also acted as a coordinator to issue any
relevant information about the failure and its effect on the entire ATC Centre. This
notice was issued in response to queries from the participating controllers. However, if
a controller did not make any attempt to contact the coordinator, the SME issued this
information at the most suitable moment during the exercise (based on the level of the
controllers workload).
Each simulation run was observed by the researcher, the assistant, and the SME; and
recorded for the purpose of further data analysis. The assistant was mainly responsible
254
Chapter 9
Experimental Investigation
for taking notes of the controllers overt behaviour prior to and after injection of failure.
A check-list using the SHAPE5s list of attitudes was used to guide the assistant in
performing this task (EUROCONTROL, 2004f). The assistant was positioned in the
least intrusive way to the controller, completely outside of his/her field of view. On most
occasions, the observation team was positioned as far from the controllers field of view
as possible, whilst still having a clear view of the radar screen. The precise set up of
the simulation room in which the experiment took place and the positions of all parties
involved are depicted in Figure 9-3.
The simulation runs were followed by an immediate debriefing session guided by the
questionnaire and other material designed specifically for this session. The controllers
were asked to evaluate all the factors that potentially influenced their recovery
performance. In addition, they were given an opportunity to judge their own
performance and the realism of the exercise itself. The questionnaire and other
material designed for the experiment and the debriefing session is presented in the
Appendix XIII.
Equipment failure in ATC, as any other unusual or emergency event, represents a
highly stressful event. In these instances the controllers are required to intervene with
complex strategies and employ their knowledge under significant pressure and high
psychological stress. For this reason, the debriefing session was used to help diffuse
stress by creating a relaxed interview environment where the participating controllers
could evaluate their actions and performance. This session was structured in such a
way as to enable comparisons across the participants. For this reason, a special
5
SHAPE project is briefly explained in Chapter 7, section 7.3.1.3. List of attitudes used to guide
the assistant in the experimental process was derived from SHAPE attitude items, such as
attentive, active, confident, thoughtful, calm, careful, and enquiring.
255
Chapter 9
Experimental Investigation
debriefing sheet had been designed prior to simulation runs. The rationale behind this
structured approach to debriefing was to ensure a consistent and reliable acquisition of
data on controller recovery performance. The debrief segment of the experiment was
used to confirm and detail observations made during the simulation run via an
approach similar to a cognitive walkthrough. In other words, this part of experiment
was used to discuss the sequence of recovery steps required by a controller to
accomplish a recovery, and to validate failure detection and the factors that influenced
each stage of the recovery (i.e. detection, diagnosis, and correction; further discussed
in Chapter 10).
The following paragraphs give a brief description of the key elements of the
experiments in terms of airspace, traffic, and failure characteristics.
256
Chapter 9
Experimental Investigation
Table 9-4 The mapping between exercise characteristics and the controllers observations
The exercise characteristics
Adequate to tolerable
Unchanged
Average to high
In addition, the weather conditions in the exercise simulated 15-25 knots southwest
wind, rain showers, half of the sky covered with cumulonimbus cloud (i.e. thunderstorm
cloud) with base at 1800ft, temperature of two degrees Celsius, and the pressure at
mean sea level (MSL) of 1032 hPa. Generally, in these conditions, icing will occur
inside cloud above 2000ft (in the ICAO standard atmosphere the temperature
decreases on average by 2 degrees Celsius/1000ft). Since the weather conditions preand post-failure injection remained unchanged (i.e. re-routings requested by pilots in
both cases), the overall weather was marked as unchanged. This was confirmed by the
SME and participating controllers (Table 9-4).
257
Chapter 9
Experimental Investigation
qualitative equipment failure impact assessment tool, and the pilot study). The FDPS
failure was chosen for this experimental set up for several reasons. Firstly, the data
available showed that this failure is both severe and frequent. Secondly, this failure
represents an example of major failures that affect multiple systems, as seen from the
qualitative equipment failure impact assessment tool. Thirdly, the participating CAA
does not have a written procedure for this particular failure which makes the controller
recovery performance more dependable upon their knowledge, experience, and
personal abilities. Finally, the technical features of the Beginning to End Skills Trainer
(BEST) platform allowed injection of this failure type and its restoration in a fairly easy
way. In order to simulate equipment failure in the most realistic way, it was necessary
to have the ability to inject failure but also to restore system functionality rapidly. This
was possible with the FDPS failure and its degradation was simulated as a sudden
failure affecting the entire ATC Centre for a period of 15 minutes.
A visual representation of this type of equipment failure on the BEST platform is
presented in Figure 9-4. Correlated radar track with all relevant flight-related
information is presented on the left-hand side of Figure 9-4, whilst the uncorrelated
track (resulting from the FDPS failure) depicting only the aircraft position is on the righthand side. It can be seen that the FDPS failure represented a failure which affects
multiple systems. The actual effects of the FDPS failure are presented in the Table 9-5
and in more detail in Table 9-6.
CALLSIGN TYPE
AFL XPT
GS
CFL XFL
ADES
(a)
(b)
Figure 9-4 The visual representation of equipment failure on CWP: a) before the failure, b) after
the failure
Reduced
flight data
processing
mode
Effects
Existence of
recovery
procedure
HMI indication on
BEST simulation
platform
No
None
258
Chapter 9
Experimental Investigation
Table 9-6 Availability of functions in the reduced flight data processing mode
Radar data source
Radar tracks
Flight plan track
Maps
Tools
Radar picture controls
Flight plan commands
Flight plan lists
ATC messages de-queue
management
Transmission of ATC messages
Coordination message
Alarm and warning facilities
General information area
Mail box management
Available
Only for flight plan tracks already displayed
Available
Available
Available
Flight plan facilities
Not available
Partially available (for display only, frozen lists)
Not available
Not available
Not available
Partially available (no MTCA warnings update)
Available
Not available
Partially available (runway in use and airspace
management are not available)
Partially available (only displayable)
Available
Not available
Not available
Not available
Not available
Partially available (percentage of use of SSR code
indication that a flight plan has received message is
incorrect and alerts are not available)
Partially available (only displayable)
Available
Partially available (only displayable)
Available
Available
Not available
Dependant variable
The recovery context (recovery context
indicator)
The recovery effectiveness
The recovery duration
259
Chapter 9
Experimental Investigation
260
Chapter 9
Experimental Investigation
Independent
variable
Extraneous
variable
Comment
Assessed in the debriefing session.
Constant
(multiple
systems
affected)
Constant
(sudden
failure)
Constant (all
workstation
affected)
Constant (no
procedure)
Refers to single vs. multiple failure occurrences. The experimental set up should assess
the impact of one failure which affects multiple ATC systems. Therefore this variable will
be constant for all subjects.
This variable varies between sudden failure and gradual degradation of the system. This
variable will be constant for all subjects.
Experiment is conducted on a single workstation with one controller at a time. But the
controller will be informed that the failure affects the entire ATC Centre.
This variable varies between adequate and inadequate time to recover. It can be
influenced by several factors. Firstly, the characteristics of a given failure will drive the
time necessary to recover through the criticality of the failed function and its detectability.
Secondly, the controller characteristics will also have an effect. More experienced
controllers may react and resolve an issue more quickly than less experienced ones.
Finally, the characteristics of traffic at the moment of failure will drive the time necessary
to recover. The more complex the traffic situation, the more recovery time will be needed
to the controller. This variable will be assessed in the debriefing session.
Theoretical review and various experiments in other safety-related industries have
confirmed the relevance of procedures to recovery performance (Kaarstad and
Ludvigsen, 2002; EUROCONTROL, 2004e; Kanse, van der Schaaf, 2000). Therefore, it
was decided to choose a failure which does not have an appropriate recovery
261
Chapter 9
Experimental Investigation
procedure.
Duration of failure
Adequacy of HMI and operational
support
Ambiguity of information
Constant
(short
duration
15min)
In the experimental set up, duration of failure should be long enough to capture all
phases of the recovery (e.g. 15min) taking into account the total duration of experiment.
Adequacy of alarms/alerts
Adequacy of organisation
Traffic complexity
Airspace characteristics
Weather conditions during the
recovery process
Conflicting issues in the situation
Age
Overall experience as a controller
Required recovery steps
Constant
(average to
high)
Constant
262
Chapter 9
Experimental Investigation
Description
Detect the problem either by pilots contact or visually on the radar display
(detection of the uncorrelated track). In both cases, the first assumption may
be a transponder failure. After confirmation that the aircraft transponder is
operational, further check on ATC system performance should be conducted.
Locate traffic
263
Chapter 9
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
S17
Experimental Investigation
It is important to state the some of the recovery steps above are of greater importance
to maintaining a safe ATC service than others. For example, maintaining identification
of all traffic, conducting timely and efficient strip marking and board management, and
maintaining separation are considered critical to overall safety in a degraded situation.
Other recovery steps, such as grounding the trainer and preventing departures, are of
less importance in that they are workload reduction measures. Nevertheless, their
implementation contributes to a safer traffic environment in unusual situations.
264
Chapter 9
tasks,
and
Experimental Investigation
appropriateness
of
traffic management.
Secondly,
the
recovery
Post-restoration task
Confirm Mode C
Continue to monitor
Locate traffic
Maintain identification of all
traffic
265
Chapter 9
Experimental Investigation
Personal correspondence with human factors experts from Netherlands National Research
Laboratory (NLR) and EUROCONTROL Experimental Centre (Human Factors Lab).
266
Chapter 9
Experimental Investigation
267
Chapter 9
Experimental Investigation
the participation of more SMEs would increase the validity of the outcome of the
experiment. Future research should address how statistical representation could be
achieved given the logistical difficulties associated with these types of experiments.
268
Chapter 9
Experimental Investigation
choice of the SME (in terms of experience and expertise), still only one SME was
available for this experiment.
9.11 Summary
This Chapter has presented in detail the experiment designed to capture controller
recovery in ATC. The Chapter started by justifying the need for the field experiment.
This was followed by an assessment of the available resources and the key
requirements that had to be accomplished. The Chapter continued by discussing and
justifying the overall experimental set up and data acquisition. This included the
presentation of the rationale for the choice of the equipment failures to be tested in the
pilot study. After the lessons learnt from the pilot study, it was possible to implement
the final changes and fine tune the set up of the main experiment. This segment
focused on the characteristics of the simulated traffic, airspace, and equipment failure,
as well as on the research variables while highlighting potential limitations. The
following Chapter analyses the data captured from this experiment.
269
Chapter 10
270
Chapter 10
analysis of the recovery variables defined in Chapter 5, their interactions, and other
relevant findings obtained form the experiment.
Experimental
results
Participants
Age
Operational experience
Ratings
Analyses of
recovery
variables
Analyses of
dependent
variables
Analysis of
interactions
Recovery context
Recovery context
indicator
The recovery
phases
Required recovery
steps
Recovery
effectiveness
Observed
behaviour and
attitude
Recovery
duration
Additional
findings
Other findings
Outcome of the
recovery process
10.2 Participants
As discussed in section 9.8 (Chapter 9), it is important that statistical representation is
achieved in research that involves sampling of the population. In this case, such
representation is required for the ATC Centre where the experiment was to be carried
out. The main distinguishing characteristics of the controllers are age, operational
experience (i.e. years in service), and rating. This section analyses these and makes a
link to statistical representation.
271
Chapter 10
10.2.2 Ratings
Figure 10-3 presents the distribution of the ratings of the controllers who participated in
the experiment. Considering that the training exercise was designed for the approach
control course (APP), it is important to highlight that 20 percent of the participants did
not have APP rating. However, half of these participants had ACC rating which
incorporates training in elements of approach control (as a part of the low level ACC
course). Although the remaining participants had only TWR rating, they had just
272
Chapter 10
completed an APP course and therefore possessed knowledge of all relevant elements
of approach control.
40
Percent
30
20
36.7
26.7
10
10
10
6.7
6.7
ACC
APP
3.3
0
All - ACC ACC and
APP TWR
APP
ACC and
TWR
APP and
TWR
TWR
Ratings
Since the experiment was conducted in three separate sessions (as discussed in
section 10.1), it is important to investigate whether the sampling on all three occasions
was appropriate. In other words, it is important to show that all three sessions come
from the same population of controllers from the ATC Centre, and that aggregated,
they represent a proper sample (Table 10-1).
Experimental session
1
Experimental session
2
Experimental session
3
M=35.9, SD=8.95
M=37.9, SD=10.3
M=37.7, SD=9.73
M=10.7, SD=6.70
M=14.3, SD=11.08
M=13.7, SD=8.22
5
4
1
0
5
2
2
1
4
5
0
1
273
Chapter 10
sessions. Details of this statistical test are presented in Chapter 6, section 6.7.4. The
statistical tests1 at 95 percent confidence level indicated that there is no difference
between the three experimental sessions (p>0.05). Based upon this, data were pooled
for further analyses.
Statistical tests investigated the null hypothesis for experimental sessions 1 and 2, 1 and 3,
and 2 and 3, separately.
274
Chapter 10
this. Firstly, the experiment in this research is designed to capture controller recovery
unaided by system tools, and emphasis is placed on controller readiness to detect and
react to an unexpected failure. Secondly, past research have already shown that in
most cases the existence of an alert does have a significant impact on recovery
performance (Kaarstad and Ludvigsen, 2002; Theis and Straeter, 2001). As a result, 18
RIFs were determined to be relevant to this experiment.
10.3.1.2 Probabilities of each RIF and the corresponding levels
Based on data collected during the post-experiment debriefing session it was possible
to derive probabilities of each RIF and its corresponding levels. The results for all 18
RIFs are presented in Appendix XIV. Furthermore, these probabilities are used to verify
the RIF probabilities defined in Chapter 8 using the verification criteria (Table 10-2). In
other words, a set of expectations was defined before comparing the RIFs probabilities
derived for a generic ATC Centre (Chapter 8) and a particular ATC Centre (used in
the experiment).
Table 10-2 Verification of RIFs probabilities from a generic approach (Chapter 8) and the
experiment
RIF groups
Verification
criteria
Result
Comment
Internal
No
difference
No difference, except
Communication for
recovery
Equipmentrelated
No
difference
No difference
External
Potential
for
difference
No difference, except
Adequacy of
organisation
Airspacerelated
Potential
for
difference
Difference is
observed with traffic
complexity and
overall task
complexity
The expected differences in RIF probabilities are a result of the experimental design
(e.g. traffic complexity and task complexity) and the overall difference in the
275
Chapter 10
populations sampled (i.e. various ATC Centres sampled in Chapter 8 compared to the
ATC Centre sampled in the experiment). In short, the comparison of RIFs probabilities
for a generic and a particular ATC Centre shows similarity.
10.3.1.3 Interactions between RIFs
This step consisted of an assessment and subsequent incorporation of interactions
between identified RIFs, as presented in Table 8-5 (Chapter 8). Based on the
methodology for the quantification of RIFs interactions developed in section 8.4.3 of
Chapter 8, it is possible to determine the coefficient of interaction for the interactions
between 18 relevant RIFs. This coefficient is k=1/(N-1)=1/17=0.059 (where N
represents the total number of relevant RIFs).
10.3.1.4 Recovery context indicator (Ic)
This particular study investigated 18 relevant RIFs, where six RIFs are defined via
three levels of impact and six RIFs via two levels of impact (according to qualitative
descriptors defined in Chapter 7, section 7.3). The remaining six RIFs are defined
through only one level, either because factors were controlled in the experiment or the
participants gave identical answers. For details see Table 10-3 and Chapter 9. In total,
this approach generates 36x 26 = 46,656 possible contexts, each defined through the
corresponding recovery context indicator.
276
Chapter 10
Descriptor
Probability
Level
Complexity of failure
type
Multiple
systems
affected
Sudden failure
All
workstations
Time course of
failure development
Number of
workstations/sectors
affected
Existence of
recovery procedure
Inappropriate
Duration of failure
Short period of
time
Ambiguity of
information in the
working
environment
External
working
environment
matches the
controllers
internal mental
model
Comment
Simulated Flight Data
Processing System (FDPS)
failure affects multiple systems
The FDPS failure is simulated
as a sudden failure
The FDPS failure is simulated to
affect the entire ATC Centre
The objective of the
experimental investigation was
to simulate failure without
recovery procedure
The FDPS failure is simulated to
last long enough to capture all
phases of the recovery
The controllers responded
positively to the question on
match between external
environment and internal mental
model, although they could not
say that this match was one
hundred percent.
After the calculation of all 46,656 possible contexts it was determined that the mean
value of the Ic is 0.029, ranging from -0.088 to 0.121. The distribution of the recovery
contexts is presented in Figure 10-4. Based on the shape of the Ic distribution, the data
has been fitted with two normal distributions. The result of this fitting is presented in
Appendix XV.
800
700
Frequency
600
500
400
300
200
100
-0
.0
88
-0
.0
78
-0
.0
6
-0 8
.0
58
-0
.0
4
-0 8
.0
38
-0
.0
2
-0 8
.0
18
-0
.0
08
0.
00
2
0.
01
2
0.
02
2
0.
03
2
0.
04
2
0.
05
2
0.
06
2
0.
07
2
0.
08
2
0.
09
2
0.
10
2
0.
11
2
277
Chapter 10
Table 10-4 Verification of the distribution of the recovery context indicator obtained from a
generic approach (Chapter 8) and the experiment
Recovery
context
indicator (Ic)
Verification
criteria
Result
Shape
Ic
Mean
Median
Range
Comment
Shape: the difference is
observed with the left tail
of the distribution
Mean: similar
Median: similar3
Range: similar
The main difference observed is the shape of the distribution in the left tail. This cannot
be explained by the difference in the RIF probabilities as the previous section showed
that they differed for only two RIFs, as a result of the characteristics of the experimental
design. Therefore, it is assumed that the shape of the left tail resulted from the local
characteristics of the ATC Centre used in the experiment (Figure 10-4). Although these
characteristics may have existed in the distribution of Ic obtained from a generic ATC
Centre (Chapter 8), they may be masked by a generic approach.
Therefore, the cause of the deviation in the left tail may be the incorporation of a single
coefficient of interaction between all RIFs, as discussed in section 8.4.3 of Chapter 8.
Although it is known from the operational experience that the RIF interactions do not
have the same level of influence, this thesis had to define a more generic approach to
account for the lack of operational data.
The assumption that a change in the shape of the Ic distribution (in the left tail) is a
result of a single value of the coefficient of interaction, no longer capable of properly
2
A mean value of Ic for a generic ATC Centre is 0.027, whilst for the ATC Centre used in the
experiment is 0.029.
3
A median value of Ic for a generic ATC Centre is -0.023, whilst for the ATC Centre used in the
experiment is -0.026.
4
A range of Ic values for a generic ATC Centre is from -0.069 to 0.131, whilst for the ATC
Centre used in the experiment is from -0.088 to 0.121.
278
Chapter 10
accounting for local characteristic is further assessed on the example of the RIF
Adequacy of HMI and operational support. This RIF is chosen because the interaction
matrix (Table 8-26, Chapter 8) indicates that this RIF impacts on several other RIFs.
Thus the change of its coefficient of interaction may have a significant impact on the Ic
distribution. As a result, the coefficient of interaction relevant to this RIF is increased
from the previous value of k=1/(N-1)=1/17=0.059 (section 10.3.1.3) by factor 10 to the
new value of k=10/(N-1)=10/17=0.59. The resulting distribution of Ic, presented in
800
700
600
500
400
300
200
100
0
-0
.0
8
-0 8
.0
7
-0 6
.0
6
-0 4
.0
52
-0
.0
4
-0
.0
2
-0 8
.0
1
-0 6
.0
04
0.
00
8
0.
02
0.
03
2
0.
04
4
0.
05
6
0.
06
8
0.
08
0.
09
2
Frequency
Figure 10-5, shows the notable change in the shape of the left tail.
Figure 10-5 Distribution of the recovery context indicator in the experiment with an increased
value of the coefficient of interaction
In short, the comparison of the distribution of Ic obtained from a generic ATC Centre
and from the particular ATC Centre shows no difference in the mean, median, and
range, but only in the shape of the left tail. This difference in the shape has been
explained by the inadequate definition of the coefficient of the interaction. As previously
discussed in Chapter 8, more accurate definition of this coefficient will be possible once
a detailed database of human performance becomes available in the ATM industry.
While the controllers responses gave a basis for the definition of the recovery context
indicator (Ic) through each possible recovery context, it was also possible to define
indicators for each controller. In several cases, the participants were not able to select
the corresponding level for several RIFs. For example, in the case of the RIF weather
conditions during the recovery process several controllers were so preoccupied with
the recovery process that they did not pay any attention to the weather conditions.
Therefore, they were unable to select the appropriate level for this RIF. The missing
responses were informed by those available for this RIF. In other words, the missing
279
Chapter 10
After the assessment of recovery contexts surrounding each controller, the next section
reviews the potential solutions to enhance the recovery context (and thus controller
recovery) using the methodology developed in Chapter 8. In other words, the next
section analyses the sensitivity of the Ic to changes in RIFs.
10.3.1.5 Optimal solutions
In searching for the areas for potential enhancement to improve the controllers
recovery process, it is necessary to focus on RIFs which may be affected at the level of
the ATC Centre. Table 10-5 presents the nine RIFs that could be enhanced, based on
the responses of the controllers who participated in the experiment and the
characteristics of the ATC Centre investigated.
280
Chapter 10
Table 10-5 A review of RIFs with the potential for recovery enhancement
RIFs
It is important to note that the remaining RIFs are not taken into account for several
reasons. Firstly, in the particular experiment, a number of RIFs attained their most
favourable levels. In such cases, the majority of controllers expressed satisfaction with
the ATC system and expressed no desire for improvement of the particular RIFs.
Furthermore, several RIFs were controlled in the experiment and as such cannot be
changed. These are: complexity of failure type, time course of failure development,
number of workstations affected, and duration of failure. Finally, certain RIFs are simply
not possible to change, such as weather, experience with a particular type of
equipment failure, whilst traffic complexity cannot be influenced at the level of the ATC
Centre. This resulted in total of nine RIFs that have the potential to enhance the
recovery context and thus controller recovery performance (Table 10-4). The next
section illustrates how the improvement of one RIF (existence of the recovery
procedure) could influence the recovery context.
10.3.1.5.1 Impact of enhancing recovery procedure on recovery context
As the participating ATC Centre does not have a recovery procedure for FDPS failure
in place, this factor is chosen as the most practical and effective way of supporting
281
Chapter 10
controllers
and
enhancing
their
recovery
performance5.
Assuming
that
the
management at the ATC Centre implements recovery procedures for FDPS failure, the
existence of recovery procedure RIF would be enhanced from Level 3 to Level 1 and
thus defined as suitable to the situation in question (the probability of Level 1 equals
1.00; Table 10-6). This approach also assumes that all other RIFs remain unchanged
and that any potential impact of this change on other RIFs will be reflected through
identified RIF interactions.
The resulting recovery context would take the mean value of 0.091 (SD=0.0398; Table
10-6). The difference in the distribution of the Ic with and without change in the
recovery procedures has been tested using the non-parametric Mann-Whitney test
(presented in Chapter 6, section 6.7.4). Overall, the baseline recovery context differs
significantly from the recovery context which incorporated the proposed enhancement.
This means that the design of an appropriate recovery procedure significantly
enhances the recovery context and thus creates a better environment for controller
recovery.
Initial
level
Ic
(M, SD, SE)
Existence of
recovery
procedure
0
0
1
M=0.029
SD=0.036
Level
after
iteration
1
0
0
Ic
(M, SD, SE)
Statistical significance
with 95% confidence
interval
M=0.091
SD=0.039
p<0.001
Sig (U=3E08, z=-196.2)
It has to be noted that the proposed change in the recovery procedure represents only
one possible form of recovery context enhancement. In reality, one ATC Centre may
undertake several other solutions to enhance controller recovery. Furthermore, the
proposed change assumes the definition of the recovery procedure for a particular
equipment failure. Therefore, the calculated recovery context indicator is valid for this
failure type only and it would have to be recalculated for other failure types.
This approach may be used to rate the significance of each proposed change and
compare it with their related cost. However, the evaluation of the related costs, as
opposed to the benefit, is not so straightforward and would necessitate an input from
5
The only available procedures in this ATC Centre are those defined by ICAO. As previously
discussed in Chapter 5, ICAO does not define recovery practice for the FDPS failure.
282
Chapter 10
the specific ATC Centre. Therefore, another approach presented in Chapter 8 may be
utilised to rate the benefit of implemented changes by the calculation of the recovery
context efficiency. The ratio between the value of the current recovery context (mean
value of 0.04; Figure 10-5) and the value of the most positive recovery context feasible
in the particular ATC Centre (i.e. Ic=0.44) indicates that a ten fold improvement is
needed to achieve the most positive value of Ic.
The next section analyses the recovery steps taken by the controllers and their overall
recovery effectiveness.
100
80
60
Steps not performed
Steps performed
40
20
0
1
3 4
5 6
7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Participants
Note that if a controller did not seek failure-related information from the coordinator, the
coordinator was advised to inform the controller but only after the controller detected the failure.
As a result, the occurrence of this step is inevitable.
283
Chapter 10
30
No. of participants
25
20
15
10
5
0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
S17
Further data analysis shows that on average each controller performed 74.2 percent of
the required recovery steps, ranging from as low as 29 percent to 100 percent. The
most neglected steps were the re-identification of all traffic (S14) and confirmation of
Mode C (i.e. confirmation of the accuracy of the post restoration FDPS data S15).
The post restoration recovery steps of re-identifying traffic and validating Mode C are
important as these steps are considered best practice to ensure system safety in the
aftermath of an FDPS failure. The re-identification process is necessary for two
reasons. Firstly, the identification of traffic is lost whilst aircraft occupy a holding
pattern. Separation in a holding pattern is purely procedural and radar separation does
not apply. Secondly, because of the potential for label swapping and garbling of radar
signals when aircraft are in close lateral proximity (i.e. such as in a holding pattern).
Further investigation of the percentage of the steps performed in three sessions
reveals a significant difference between the first and the third session. The percentage
of the steps carried out in the first session is significantly lower than in the third
session. The relevant statistics are presented in Table 10-7. The percentage of the
performed recovery steps in the first experimental session is on average 64 percent,
increasing in the second experimental session to 77 percent, reaching 82 percent in
the third experimental session (Table 10-7).
284
Chapter 10
Statistics
M=63.98
SD=21.69
M=77.06
SD=17.64
M=81.77
SD=12.84
Paired sessions
1 and 2
p>0.05
1 and 3
p=0.044
Sig (U=23.5, z=-2.0)
2 and 3
p>0.05
After the last experimental session, it was suspected that certain changes had been
implemented in the training of controllers in the participating ATC Centre. The
debriefing session with controllers participating in the third experimental session and
the input from management revealed the incorporation of a compulsory emergency
training module within every rating conversion and continuation training course. This
change was firstly incorporated in the SID/STAR training that started on May 2006. As
a result, several controllers participating in the third experimental session (taking place
in June 2006) benefited from this change. It seems that that this change in training
syllabus led to the increased number of recovery steps performed and the significant
difference observed when compared to the first experimental session.
Statistical tests performed to determine the relationship between the percentage of
recovery steps performed and 18 RIFs, showed that only RIF2 (previous experience
with equipment failures) has a statistically significant correlation. More precisely, the
negative correlation identified (r=-0.31) indicates that controllers who have experienced
equipment failures tend to perform more of the required recovery steps compared to
those who have not experienced failure. In other words, experience with equipment
failures enhances the controllers ability to recover. This finding should be transferred
into the training syllabus of every ATC Centre.
285
Chapter 10
steps were performed only to some basic extent without any proper check of the new
data accuracy. In addition, such a high percentage of inadequate performance
indicates that there is room for improvement throughout the ATC Centre participating in
this experimental investigation. The management of the ATC Centre should implement
solutions to assure a more efficient handling of unusual/emergency situations. Such
solutions could include emergency training on equipment failures, design of recovery
procedures, and regular briefings.
Figure 10-9 Distribution of recovery effectiveness per category (presented via frequencies and
relative percentages)
Comparison of the recovery effectiveness for the three experimental sessions does not
reveal any significant differences (using the non-parametric Mann-Whitney test). In
spite of the implemented change in the participating ATC Centre (i.e. compulsory
emergency training module within the SID/STAR conversion training) and the increase
in the number of recovery steps performed, the effectiveness of the recovery
performance did not differ from one session to the other. This finding confirms that the
rating of recovery effectiveness does not depend on a simple count of recovery steps
performed. This finding further justifies the use of pooled data from all three
experimental sessions. It is an indication of the overall objective achieved with the
execution of those steps but without account of the time frame (recovery duration)
within which the objective is achieved. The combined effect of recovery effectiveness
and recovery duration is assessed in section 10.3.5.
286
Chapter 10
287
Chapter 10
Comparison of the recovery duration for the three experimental sessions revealed
significant differences. More precisely, the recovery duration in the third experimental
session is significantly longer than in the first two sessions (Table 10-8). This is a result
of the controllers from the third session reacting to the identified failure more promptly
compared to the controllers from the previous two sessions. This may be the result of
the change in the training implemented by the management in the participating ATC
Centre prior to the third session. However, it has to be noted that more prompt reaction
to the identified failure (i.e. longer recovery duration) does not necessarily entail an
effective recovery.
Statistics
M=14:15
SD=1:02
M=14:25
SD=0:58
M=15:14
SD=0:18
Paired sessions
1 and 2
p>0.05
1 and 3
2 and 3
Non-parametric Kendalls tau tests performed between recovery duration and various
RIFs, reveal four statistically significant correlations. These are presented in Table 10-9
while the details of this test are discussed in Chapter 6. Firstly, the analysis shows that
288
Chapter 10
the recovery duration tends to be longer7 if the last emergency training had a module
on equipment failures. This finding indicates the benefit that emergency training has on
recovery duration (as it prepares controllers to react rapidly to an emergency situation).
Secondly, a similar effect on recovery duration is seen with enhanced communication
for recovery. In other words, if the controllers initiate recovery sooner, they have more
time to adequately communicate the problem to team members or a supervisor.
Thirdly, the existence of adequate recovery procedures promotes prompt recovery
action. This is in line with the finding of the first test. Finally, recovery duration
increases with a decrease in traffic complexity. This is expected as the less demanding
traffic situation allows more prompt action and initiation of the first recovery action
sooner rather than later.
Recovery
duration
Variable 2
Last emergency training
(module on equipment failure)
Communication for recovery
Existence of the recovery
8
procedure
Test
Statistical significance at
95% confidence level
p=0.018 (r=-0.39)
The nonparametric
correlation
(Kendalls tau)
p=0.10 (r=-0.39)
p=0.15 (r=-0.41)
p=0.004
(r=-0.46)
Traffic complexity
After assessing both recovery effectiveness and recovery duration, it is realised that
independently they are not appropriate indicators of the recovery outcome, as
discussed in Chapter 5. Therefore, a safety assessment of the overall recovery
performance necessitates the use of both variables combined into the outcome of the
recovery process presented in the following section.
More prompt first recovery action by a controller is representative of the longer recovery
duration.
8
There is no recovery procedure for the simulated equipment failure in the participating ATC
Centre, but some controllers stated that they had experienced similar failures as part of their
initial simulator training. Discussion with the subject matter expert revealed that this particular
equipment failure is not simulated in any training syllabus.
289
Chapter 10
recovery process focuses solely on the outcome of controller recovery. This is defined
as a combination of two recovery variables. Firstly, recovery effectiveness that
accounts for recovery steps carried out by a controller and achievement of the three
key objectives (i.e. ATC system protection, maintenance of situational awareness, and
adequate post-restoration steps). Secondly, recovery duration accounts for the time
frame in which these steps were performed. In line with the discussion in Chapter 5,
the outcome of the recovery process is accounts for successful and unsuccessful
recovery. An additional category for tolerable recovery outcome is also defined in this
thesis (Table 10-10).
Table 10-10 The outcome of the recovery process matrix applicable to the experimental set up
presented in this thesis (S stands for successful, T for tolerable, and U for unsuccessful
recovery)
Recovery
Effectiveness
12-13
T
T
U
U
U
13-14
T
T
T
U
U
14-15
S
T
T
T
U
15-16
S
S
T
T
T
The recovery outcome matrix highlights that successful recovery requires the initiation
of the recovery process within the first two minutes from the instant of the failure
occurrence and the performance of the majority of the recovery steps (assuring
achievement of all three objectives). An unsuccessful recovery is a result of a controller
failing to achieve two or more key objectives while initiating the recovery after more
than one minute from the instant of the failure occurrence. The delayed first recovery
action leaves the ATC system completely unprotected. Therefore, the temporal
requirements for the unsuccessful recovery account for three categories of the
recovery duration variable (Table 10-10). Everything outside the scope of the
successful and unsuccessful recovery is considered tolerable. The above discussions
are only applicable to this experimental time frame and setting, and are extracted
based on operational experience, with a further validation by the SME.
Based on the presented categorisation, the outcome of the recovery process for
controllers who participated in the experiment is mostly tolerable (Figure 10-11). This
finding again confirms that there is room for improvement of the recovery performance
in the ATC Centre used in this experiment.
290
Chapter 10
After assessing all recovery variables, the next section identifies any relevant
interactions between them.
10.3.6 Interactions
This section investigates the level of interactions between the recovery variables using
statistical testing (previously discussed in Chapter 6). Table 10-11 presents the results.
Variable 2
Recovery
effectiveness
Outcome of the
recovery process
Outcome of the
recovery process
Outcome of the
recovery process
Test
Statistical significance at 95
percent confidence interval
p=0.06, r=0.329
Nonparametric
test (Kendalls
tau)
p=0.017, r=-0.36
p=0.01, r=0.57
p>0.05
291
Chapter 10
relationship between recovery context indicator for the combined category of very
good and good recovery effectiveness on one side and partially adequate and totally
inadequate on the other (at the 90 percent confidence interval, p=0.065). Secondly, a
statistical test indicates a significant relationship between the recovery context indicator
and the outcome of the recovery process at the 95 percent significance level (p=0.017,
r=-0.36). In other words, the higher values of the recovery context indicator enhance
the outcome of the recovery process or the recovery success. Finally, a statistical test
indicates a significant relationship between recovery effectiveness and the outcome of
the recovery process. In other words, the greater controller recovery effectiveness the
more successful is the overall recovery. All findings are in line with the operational
experience.
10
Note that the display range in this experiment was set to 30Nm for each controller.
292
Chapter 10
asked to standby for radar identification. In the case of late contact by the first
uncorrelated track (once the track is almost visible on the radar screen or at about
35Nm from the airport), controllers searched for the track and detection of the problem
was then immediate. The common factors that influenced the detection phase of the
recovery process in this experiment were determined based on observations, video
recordings, and debriefings. These are as follows:
The first radio contact (RT) of uncorrelated track;
Traffic complexity and related level of controller workload at the moment of contact;
Display range (set at 30Nm for this experiment);
Type of the equipment failure (uncorrelated tracks were immediately visible on the
screen once within radar range); and
Complexity of failure type (affecting single or multiple equipment simultaneously).
It should be noted that the same set of factors also affected the instant of the first
recovery action. The reason is that detection is a prerequisite for the first recovery
action.
10.3.7.1.2 Diagnosis
In this experiment, after the detection of one uncorrelated track, the controllers first
assumption was usually aircraft transponder failure. This prompted a request to the
pilot to squawk identification on the secondary transponder (i.e. to operate the
designated Mode A code on the primary/secondary transponder). When this check did
not produce a correlated track on the radar screen further checks were necessary. At
this stage, the second aircraft was usually well inside the radar display range also in an
uncorrelated state. At this point, it became obvious to the controllers that they were
experiencing some form of equipment failure and they sought information from the ATC
Centre coordinator as to the nature of the failure. The possible options were failure of
secondary surveillance radar or FDPS failure. SSR failure was discounted as soon as
the mix of correlated and uncorrelated tracks was visible. The final option was FDPS.
The coordinator was instructed to announce that it was FDPS failure affecting the
entire ATC Centre. Moreover, he also emphasised that flight plan tracks would remain
correlated only for tracks already displayed, while all other tracks entering the system
will appear uncorrelated. The common factors that influenced the diagnosis stage of
the recovery process in this experiment were determined based on observations, video
recordings, and debriefings. These are as follows:
The number of uncorrelated tracks observed on the radar display;
Input by the coordinator;
293
Chapter 10
11
The debriefing sessions investigated the overall quality of strip management and annotation
without going into a more detailed analysis. In future, the structure of the debriefing session may
place more emphasis on this segment of the recovery process.
294
Chapter 10
Figure 10-12 Recovery phases, their corresponding influencing factors and required recovery
steps
295
Chapter 10
others displayed the complete opposite. The deviations from the pre-failure behaviour
involved the following:
increased movement (i.e. overall posture, hands, feet, or head);
forceful displacement of the strip holders;
deviations from standard RT phraseology;
hesitation in RT communication; and
change in pitch or tone of voice.
The subject matter expert involved confirmed that most of these behavioural gestures
depict a typical reaction to a reduced mental picture of either the traffic or overall
situational awareness. Even during the debriefing stage of the experiment, the change
in the controllers behaviour was noticeable for the first two experimental sessions.
Examples include shaky voice, overall unease, high alertness, and seriousness. The
controllers who performed the recovery process at either tolerable or good levels were
noticeably more relaxed and talkative. On the other hand, the controllers who
performed at either partially adequate or inadequate levels were without exception
more nervous and reluctant to answer questions in detail, and carry out an objective
review of their own performance. The overall conclusion is that the equipment failure
was an unexpected event and contributed to a significant increase in the controllers
workload (as reported subjectively by the participating controllers).
10.3.7.3 Additional findings
It is important to present all acquired findings as they represent important issues for the
management of the participating ATC Centre as well as the wider aviation community.
These are presented in the following paragraphs.
Although 73 percent of the controllers reported that their training was suitable to the
equipment (i.e. FDPS) failure and traffic scenario in question, analysis of data collected
in the experiment showed that for 43 percent (of the 73 percent) received the last
emergency training more than a year prior to the experiment12. From the controllers
who were able to recall, 50 percent stated that the emergency training session they
participated in had a module on equipment failures, predominantly on radar failures.
However, it was also noted that 40 percent of the controllers did not have any type of
equipment failure in their last emergency training. As a result, 93 percent of controllers
12
Note that 27 percent of controllers had their last emergency training in the month prior to this
experiment, as a part of the approach rating course.
296
Chapter 10
who participated in the experiment reported they would like to have more frequent
training for unusual situations. The most desired frequency of emergency training
sessions was every six months. This is in line with the findings obtained in the
questionnaire survey (Chapter 6) where 45 percent of controllers believe that recurrent
training once a year is not enough to develop and maintain the level of proficiency
required for recovery from equipment failures.
Interesting results were obtained on the question on the existence of a recovery
procedure for the simulated FDPS failure. Although the procedure for this kind of failure
does not exist in the Manual of Air Traffic Services (MATS), 20 percent of controllers
believed that this particular procedure does exist. Some of the controllers, who had
participated in the approach control course, quoted their training manual as the
reference for this procedure. However, no evidence was found to support their
statement. The best explanation for this is that these controllers identified Secondary
Surveillance Radar (SSR) failure with FDPS failure and relied on their recent radar
fallback training, without fully understanding what the implications of the loss of FDPS
are. The outcome of FDPS failure is significantly different from simple SSR failure, as it
represents a more serious failure that requires immediate attention from the controllers
with the required skills.
On the issue of Human Machine Interface (HMI) and operational support (e.g. auxiliary
display, communication panel) 46.7 percent of controllers found the Beginning to End
Skills Trainer (BEST) simulator platform suitable to the equipment failure and traffic
scenario in question, 36.7 percent found it tolerable, while ten percent found it counter
productive. 6.7 percent of the controllers did not respond to this question. However,
most of the controllers stated that the BEST platforms HMI is not as good as the HMI
used in the operational centre. There are two reasons for this. Firstly, meteorological
data needs better positioning (i.e. closer to the screen) to avoid head turn and change
of visual field and secondly, a lack of alert or warning that a failure has occurred (i.e.
colour change to yellow or red in the general information window).
Several organisational issues were raised during the debrief sessions. The most
frequent issues raised were that controllers:
felt that supervisors should receive more dedicated training in the handling of
unusual occurrences and system failures. Their role in coordinating recovery
actions should be more proactive. In addition, it was highlighted that coordination
297
Chapter 10
with technical services and adjacent ATC Centres should be the primary
responsibility of the supervisor during a Centre crisis;
felt that more emphasis could be placed on developing an understanding of the
separate roles of both controllers and engineers. This perceived lack of
understanding of each peer groups function and tasks can create communication
difficulties in the operational environment;
identified a need for an update of the MATS with regard to the on suite task
allocation between the executive and planning controller. Additionally, controllers
stated that the last three incidents involving a loss of standard separation involved
team related issues that contributed to the events. Therefore, it is necessary to
strengthen the relationship between executive and planning controllers and to
define their precise roles and responsibilities;
stated that their roles as currently defined in MATS are ideal but in reality are
difficult to adhere to, especially in a busy operational environment. They further
stated that in the event of an unusual occurrence, there are no guidelines available
for the handling of such situations;
stated that competency checking, conducted once per year for only one hour, is not
sufficient. They also stated that the availability of refresher training in unusual
occurrences is also limited to once per year. One again, this finding is in line with
the questionnaire survey results presented in Chapter 6.
In general, the participating controllers rated their own performance between efficient
and tolerable (47 percent rated their own performance as efficient and 50 percent as
tolerable). This is not in accordance with the overall assessment of their performance
(recovery effectiveness) where 43 percent of the controllers performed at the partially
adequate and inadequate levels. This should pose some concern especially
considering that 46.7 percent of controllers stated that their performance in this study
was no different from any other day. In addition, 45 percent of them marked their
performance as highly representative of their overall ability to recover from an
equipment failure in ATC. Finally, 70 percent of controllers stated that the task they
experienced in the experiment was highly realistic.
Furthermore, 33 percent of the controllers stated that they were not aware of the
complete impacts/implications of a particular failure or equipment failures in general. As
a result, 87 percent of the controllers stated that they would like to have some form of
aide memoire available at each CWP to assist them in recognising the effects of a
particular equipment failure and steps to be taken to recover. As a consequence this
298
Chapter 10
Finding
Comment
Trust in ATC
technology
Recovery
procedure
HMI
Overall recovery
performance
Awareness of the
impact of a
particular failure
Availability of
aide memoire
10.4 Summary
The Chapter set out to achieve several objectives. Firstly, it set out to verify a
methodology for the quantitative assessment of the recovery context (defined in
Chapter 8) and its operational benefits. Secondly, it set out to verify a framework for an
in depth analyses of controller recovery using recovery variables previously identified in
Chapter 5. The final objective set out to assess the outcome of the recovery process.
All these objectives have been achieved by the experiment and several interesting
findings have been produced. These are as follows:
The majority of controllers tend to omit some critical recovery steps related to the
post-restoration phase. These are re-identification of traffic and confirmation of
the accuracy of information provided by the restored equipment. The sampled
controllers seemed to rely on the information provided without questioning its
accuracy following the occurrence of a failure.
299
Chapter 10
Controllers with prior experience of equipment failures tend to carry out more
recovery steps compared to those without prior experience. In other words,
experience with any equipment failure tends to enhance the controllers ability to
deal with equipment failures. Moreover, this type of stress-exposure training
enhances the stress-coping skills of controllers and as such should be
incorporated into the training syllabus of every ATC Centre.
A high percentage of inadequate recovery performance indicates that there is
room for improvement throughout the ATC Centre participating in the experiment.
Hence, the ATC Centre management should implement solutions to assure
efficient handling of unusual/emergency situations. Note, however that the
management of the ATC Centre where the experiment took place implemented
an initial process to train controllers to deal with unusual/emergency situations.
This was in the form of a compulsory emergency training module within every
rating conversion and continuation training course.
The first recovery action tends to occur more promptly if a controller has had
training for unusual/emergency situations.
If the controllers initiate recovery sooner, they communicate better with team
members and the supervisor.
The existence of adequate recovery procedures tends to promote prompt
recovery action.
Recovery duration tends to increase with a decrease in traffic complexity. This is
expected as the less demanding traffic situation allows the controllers to initiate
recovery action sooner rather than later.
The outcome of the recovery process variable has been defined as an overall
safety indicator of the recovery process. It represents a combination of the
recovery effectiveness and duration.
The recovery context indicator represents a good indicator of both recovery
effectiveness and the outcome of the recovery process.
Recovery duration itself is not a good indicator of the outcome of the recovery
process, whilst recovery effectiveness is.
The framework for the analysis of controller recovery proposed in this thesis and
verified in the operational environment, shows a potential for an in depth analysis
of controller recovery from equipment failures in ATC.
300
Chapter 11
11
Conclusions
Conclusions
This Chapter presents the main findings of the research on controller recovery from
equipment failures in Air Traffic Control (ATC) and suggests avenues for future work.
The approach taken for the former is to address each of the research objectives
formulated in Chapter 1 (repeated below for ease of reference) and to present the
corresponding findings. The Chapter concludes with the identification of research
questions and ideas to be explored in future research.
11.2 Conclusions
11.2.1 Literature review
The review of relevant literature aimed to connect ATC equipment failures with both
technical and air traffic controller recovery. With respect to the literature review, the
following conclusions are relevant:
1. The assessment of controller recovery from equipment failures in ATC has to
address technical and controller recovery together and not in isolation as has
been the case in the past. This holistic approach enables a complete
understanding of controller recovery and all of its influencing factors.
301
Chapter 11
Conclusions
2. Because of the variety of equipment, components, and tools in both current and
future ATC system architectures, ATC equipment should be classified based on
the type of ATC functionality it supports. Such a functional classification is
flexible to changes in ATM/ATC and can capture both current and future
equipment failure types.
3. Recovery procedures, recovery training, and past experience with equipment
failures are the main drivers of controller recovery performance. However, the
provision of both recovery procedures and training is inconsistent, across ATC
Centres.
4. The context in which controller performance takes place has an important role
in controller recovery.
302
Chapter 11
Conclusions
longer the failure, the less severe it is. This finding is expected as more severe
failures are attended to immediately.
The conclusions listed above, resulting from the investigation of equipment failure
types and their characteristics in the operational ATC environment, have the potential
to impact policy formulation and the operational aspects of ATC/ATM. The thesis
findings have highlighted, for the first time, the ATC functionalities that are most
affected by equipment failures as well as those which have the most severe impact on
ATC operations. These use of the findings are twofold. Firstly, to identify the equipment
failure types mandatory for recovery training/procedures designed for an ATC Centre.
Secondly, the qualitative equipment failure impact assessment tool can be used as a
part of the incident investigation process as well as a design tool, supporting the design
of recovery training scenarios.
303
Chapter 11
Conclusions
13. The questionnaire survey showed that the vast majority of ATC Centres
surveyed have some form of recovery procedure. The most neglected
procedures are for ATC functionalities which are most challenging to controller
recovery (data processing, surveillance, and communication functionalities). In
addition, controllers highlighted the need for an abbreviated version of the
contingency manual which should be made available at each controller working
position (i.e. aide-memoire).
14. Recovery procedures should be up-to-date, complete, and follow a logical
sequence of steps that the controllers should perform. In addition, recovery
procedures need to be compatible with other procedures within the ATC Centre.
In short, procedures should be seen as guidance to the controller, they should
be adaptable to any given situation, and should take account of a variety of
contextual factors.
15. Half of the ATC Centres surveyed in the questionnaire survey have
programmes for training in recovery from equipment failures. However, this
recurrent training is usually provided once a year. The controllers believe that
the frequency of recurrent training is inadequate and are in favour of receiving
as much training as possible on emergency/unusual situations, including
equipment failures.
16. Recurrent training must be up-to-date and compatible with other training
programmes. Moreover, the recurrent training exercises should be varied and
realistic covering both outages and less severe failures. The ATC Centre should
adopt a custom of periodically reverting to backup systems in order to maintain
controllers proficiency with their usage, perhaps during less busy traffic
periods.
17. Regular training on system functionalities, upgrades, and degradation modes
could be a useful method to ensure consistent knowledge and familiarity with
the ATC system architecture.
18. The majority of controllers surveyed confirmed the importance of context
surrounding an equipment failure occurrence. This confirmed the earlier finding
from existing research literature.
19. The context surrounding controller recovery from equipment failure in ATC is
defined via 20 contextual factors, known as Recovery Influencing Factors
(RIFs). Each RIF can be further defined via its qualitative descriptor. This
establishes the relationship between each RIF and its influence on controller
performance.
304
Chapter 11
Conclusions
20. An aggregated indicator of the entire recovery context has been proposed,
referred to as recovery context indicator (Ic). This quantitative indicator of the
recovery context is sensitive to changes in the individual RIFs.
This thesis presents for the first time, a comprehensive set of the factors that influence
controller recovery (RIFs). These factors can be used as part of an incident
investigation process, enabling a detailed investigation of the impact of context on
controller recovery performance. The identification and assessment of RIFs can also
be used for the identification of recommendations on various aspects of ATC operation
and their refinement. However, the final decision of the optimal recommendation should
be based on the degree of positive shift in the value of the recovery context indicator
(as the quantitative indicator of the recovery context). Within the future ATM system,
this methodology could be easily modified to account for the shared responsibility of
separation of aircraft and collaborative decision-making between airborne and ground
based ATM system components.
305
Chapter 11
Conclusions
25. Furthermore, the experiment showed that the existence of recovery procedures
(or any type of reference material, such as training manuals) promotes prompt
recovery action.
26. The experiment also showed that recovery duration increases with a decrease
in traffic complexity.
27. The recovery context indicator represents a good indicator of both recovery
effectiveness and the outcome of the recovery process (represented as a
combination of the recovery effectiveness and duration).
28. The thesis has identified a statistically significant correlation between recovery
context indicator and the outcome of the recovery process. Hence, the outcome
of the recovery process represents a good safety indicator of the overall
recovery process.
The relevance of recovery training (either as an alternative or an addition to past
experience) and recovery procedures has been confirmed by experiment. Recovery
training and awareness of recovery procedures lead to more prompt recovery action,
better awareness of required recovery steps, and enhanced team communication.
These findings should directly inform the required policy on training and procedures for
handling unusual/emergency situations, highlighting required content, frequency, and
format. Furthermore, the recovery variables identified (recovery context, effectiveness,
and duration) have the potential to facilitate a rigorous analysis of controller recovery
from equipment failures in ATC and thus can be used in incident investigation
processes. Finally, the recovery context indicator represents a good indicator of the
outcome of the recovery process (represented as a combination of the recovery
effectiveness and duration). As such, the overall framework for the analysis of
controller recovery based on identified recovery variables can be used to assess the
outcome of the recovery process in both current and future ATM environment.
306
Chapter 11
Conclusions
307
Chapter 11
Conclusions
308
Chapter 12
List of References
12 List of References
10News (2006). Power Outage Momentarily Interrupts Air Traffic Control. From
http://www.10news.com/news/8831526/detail.html
Air Transport Action Group (2005). The economic & social benefits of air transport.
From http://www.atag.org/files/Soceconomic-124721A.pdf
Air Transport Association (2006). Cost of ATC Delays. From
http://www.airlines.org/economics/specialtopics/ATC+Delay+Cost.htm
Airbus (2004). Global Market Forecast 2004-2023. From
http://www.airbus.com/en/myairbus/global_market_forcast.html
Airways New Zealand (2006a). Manual of Air Traffic Services (amendment 113).
Airways New Zealand.
Airways New Zealand (2006b). Domestic and International Aircraft Movements by
Calendar Year. From http://www.airways.co.nz/documents/avimove_stats.pdf
Aviation International News (2001). Europeans embracing MLS with a vengeance.
From http://www.ainonline.com/issues/04_01/Apr_2001_europeanmlspg75.html
Bainbridge, L. (1983). Ironies of Automation. Automatica, 19, 775-779. From
http://www.bainbrdg.demon.co.uk/Papers/Ironies.html
Bainbridge, L. (1984). Diagnostic Skill in Process Operation. Department of
Psychology, University College London. From
http://www.bainbrdg.demon.co.uk/Papers/DiagnosticSkill.html
Baker, S., and Weston, I. (2001). Mayday, mayday, mayday. From
http://www.isasi.org/working_groups/ats/atsmayday.pdf
Berenson, M.L., Levine, D.M., Krehbiel, T.C. (2006). Basic Business Statistics:
Concepts and Applications. Prentice Hall: Upper Saddle River, NJ.
Billings, C.E. (1996). Aviation Automation: The Search for a Human-Centred Approach.
Hillsdale, N.J.: Lawrence Erlbaum Associates.
Boehm-Davis, D., Curry, R.E., Wiener, E.L., and Harrison, R.L. (1983). Human factors
of flight-deck automation: Report on a NASA industry workshop. Ergonomics, 26,
953-961.
Boeing (2004). Statistical Summary of Commercial Jet Airplane Accidents: Worldwide
Operations 1959 2003. From
http://www.boeing.com/news/techissues/pdf/statsum.pdf.
Bove, T. (2002). Development and Validation of a Human Error Management
Taxonomy in Air Traffic Control. PhD dissertation. Ris National Laboratory,
Roskilde. From http://www.risoe.dk/rispubl/SYS/syspdf/ris-r-1378.pdf
309
Chapter 12
List of References
British Airways (2006). Flight Training Safety and Emergency Procedures (SEP)
Training. From http://www.britishairwaysjobs.com/baweb1/?newms=info150
Brooker, P. (2004). Consistent and up-to-date aviation safety targets. Draft version.
Cranfield University.
Brooker, P. (2006). Air Traffic Control Safety Indicators: What is Achievable?
Eurocontrol: Safety R&D Seminar, 25-27 October 2006, Spain. From
https://dspace.lib.cranfield.ac.uk/bitstream/1826/1372/1/Eurocontrol+2006+ATCBrooker.pdf
Bureau of Transport and Regional Economics (2006). Aviation. Australian Government.
From http://www.btre.gov.au/statistics/aviation.aspx
Bureau of Transportation Statistics (2004). Airline On-Time Statistics and Delay
Causes. From http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
Bureau of Transportation Statistics (2006). Dictionary. From
http://www.bts.gov/dictionary/list.xml?letter=A
CASA (2006). ADS-B: Automatic Dependent Surveillance Broadcast. Civil Aviation
Safety Authority Australia. From http://casa.gov.au/pilots/download/ADS-B.pdf
Christensen, W.C., and Manuele, F.A. (1999). Safety through Design: Best Practices.
National Safety Council Press.
Cox, K. (2005). Teamwork and Trust: A Pilots Perspective. From
http://safecopter.arc.nasa.gov/Pages/Columns/SBrief/SafeBrf1Articles/6Teamwor
k.html
Damidau, A., Kirwan, B., and Scrivani, P. (2006). Safety Getting Real: Safety Insights
from Real Time Simulations. Proceedings from the EUROCONTROL Safety R&D
Seminar, Barcelona 25-27 October 2006, Spain.
Daniels, J.J., Regli, S.H., and Franke,J.L. (2002). Support for Intelligent Interruption
and Augmented Context Recovery. Proceedings from 7th IEEE Human Factors
Meeting. Scottsdale, Arizona.
Dekker, S., Fields, B., and Wright, P. (2004). Human Error Recontextualised. From
http://www.cs.mdx.ac.uk/staffpages/bobf/papers/glasgow.pdf
Department of Defense (2001). Global Positioning System: Standard Positioning
Service Performance Standard. Command, Control, Communication, and
Intelligence. Washington DC.
Endsley, M. (1997). Situation Awareness, Automation & Free Flight. From http://atmseminar-97.eurocontrol.fr/endsley.htm
Endsley, M. R., and Kaber, D. B. (1999). Level of automation effects on performance,
situation awareness and workload in a dynamic control task. Ergonomics, 42(3),
pp. 462-492.
Endsley, M., and Kiris, E. (1995). The out-of-the-loop performance problem and level of
control in automation. Human Factors, 37(2), pp. 381-394.
EUROCONTROL (1997). EUROCONTROL Standard Document for Radar Surveillance
in En-Route Airspace and Major Terminal Areas. From
http://www.eurocontrol.int/surveillance/gallery/content/public/documents/SURVST
D.pdf
EUROCONTROL (1999). CD-ROM: An introduction to ATM. EUROCONTROL Institute
of Air Navigation Services.
310
Chapter 12
List of References
311
Chapter 12
List of References
312
Chapter 12
List of References
313
Chapter 12
List of References
Feng, S., Ochieng, W., Walsh, D., and Ioannides, R. (2005).A Measurement Domain
Receiver Autonomous Integrity Monitoring Algorithm. GPS Solutions. Springer
Berlin/Heidelberg.
Frese, M. (1991). Error Management or Error Prevention: Two Strategies to Deal with
Errors in Software Design. In H. J. Bullinger (Ed.) Human aspects in Computing:
Design and Use of Interactive Systems and Work with Terminals. Amsterdam:
Elsevier Science Publishers.
Frese, M., Brodbeck, F.C., Zapf, D., & Prumper, J. (1990). The Effects of Task
Structure and Social Support on Users Errors and Error Handling. In D. Diaper et
al. (Eds.) Human Computer Interaction - INTERACT90 (pp.35-41). Amsterdam,
Elsevier Science Publishers.
Fujita, Y., and Hollnagel, E. (2004). Failures without errors: quantification of context in
HRA. Reliability Engineering and System Safety, 83, pp. 145-151.
Funk, K., Lyall, B., and Riley, V. (1996). Perceived Human Factors Problems of
Flightdeck Automation: Phase 1 Final Report. Federal Aviation Administration
Grant 93-G-039. From
http://www.flightdeckautomation.com/phase1/phase1report.aspx
General Accounting Office (1982). Computer Outages at Terminal Facilities and Their
Correlation to Near mid-air Collisions (AFMD-82-43). US GAO, Washington DC.
General Accounting Office (1991). Air Traffic Control: FAA Can Better Forecast and
Prevent Equipment Failures. US GAO, Washington DC.
General Accounting Office (1996). Air Traffic Control: Good Progress on Interim
Replacement for Outage-Plagued System, but Risks Can Be Further Reduced.
US GAO, Washington DC.
General Accounting Office (1998). Air Traffic Control: Information Concerning
Equipment Outages at Two Kansas City Area Facilities. US GAO, Washington
DC.
Gordon, R., and Makings, N. (2003). Gate 2 Gate: Stakeholder Safety Survey.
EUROCONTROL Experimental Centre, France.
Graham, G.M., Kinnersly, S and Joyce, A. (2002). Safety Reporting and Aviation
Target Levels of Safety. In C.W. Johnson, Investigation and Reporting of
Incidents and Accidents (IRIA 2002). Department of Computing Science,
University of Glasgow, Scotland.
Hai, L. (2004). Civil Aviation Safety Outline (2001-2020). From
http://www.seaskyad.com/ad@cca_english/content/content_0206_special_article
s/article16.htm.
Hallbert B.P. and P. Meyer (1995). Summary of lessons learned at the OECD Halden
reactor project for the evaluation of human-machine systems. Institutt for
Energiteknikk, Halden, Norway.
Heinrich, H.W. (1941). Industrial Accident Prevention A Scientific Approach. Mc Graw
Hill: New York and Wiley: London.
Hilburn, B. (2004). Cognitive Complexity in Air Traffic Control - A Literature Review.
EUROCONTROL Experimental Centre, EEC Note 04/04.
Hilburn, B., and Flynn, M. (2001). Air Traffic Controller and Management Attitudes
Toward Automation: An Empirical Investigation. 4th USA/EUROPE Air Traffic
Management R&D Seminar, Santa Fe, USA.
314
Chapter 12
List of References
315
Chapter 12
List of References
316
Chapter 12
List of References
317
Chapter 12
List of References
Ministry of Land, Infrastructure, and Transport (2006). Statistics. Air Traffic Activity at
Cab Facilities: Area Control Center. From
http://www.mlit.go.jp/koku/04_hoan/e/statistics/image/00_00.gif
Mohleji, S., C., Lacher, A. R., and Ostwald, P.A. (2003). CNS/ATM System Architecture
Concepts and Future Vision of NAS Operations. In 2020 Timeframe. Center for
Advanced Aviation System Development (CAASD), The MITRE Corporation.
From
http://www.mitre.org/work/tech_papers/tech_papers_03/mohleji_2020/mohleji_20
20.pdf
National Aeronautics and Space Administration (2000). Required Communication
Performance (RCP). From http://as.nasa.gov/aatt/wspdfs/Oishi.pdf
National Aeronautics and Space Administration (2002). NASA Safety Manual
w/Changes through Change 1 (NPR 8715.3). NASA QS / Safety & Risk
Management Division.
National Air Traffic Services (1999). Testing Operational Scenarios for Concepts in
ATM (Phase II). WP2: Airspace Sectorisation Optimisation. European
Commission.
National Air Traffic Services (2002). Manual of Air Traffic Services Part II. London Area
Control Centre, edition 2/02.
National Air Traffic Services (2004). NATS apologises for delays experienced today.
From http://www.nats.co.uk/news/news_stories/2004_06_03_2.html
National Transportation Library (1997). Potential Cost Savings Ideas for FAA and
Users. From http://ntl.bts.gov/lib/000/500/511/costsav.pdf
National Transportation Safety Board (1973). Aircraft Accident Report (AAR-73-14).
From http://amelia.db.erau.edu/reports/ntsb/aar/AAR73-14.pdf
National Transportation Safety Board (1983). Aircraft Accident Report (AAR-83-02).
From http://amelia.db.erau.edu/reports/ntsb/aar/AAR83-02.pdf
National Transportation Safety Board (1996).Special Investigation Report: Air Traffic
Control Equipment Outages. Washington, D.C.
Nolan, M. S. (1998). Fundamentals of Air Traffic Control. Belmont, USA: Wadsworth
Publishing Company.
Nuclear Regulatory Commission (1998). Technical Basis and Implementation
Guidelines for a Technique for Human Event Analysis (ATHEANA). NUREG1624. U.S. Nuclear Regulatory Commission, Washington, DC.
Ochieng, W.Y. (2006). Future Air Traffic Management. Course presentation for Air
Traffic Management Module (T23). Imperial College London.
Orasanu, J., and Fischer, P. (1997). Finding decisions in natural environments: the
view from the cockpit. In Zsambok, C.E. & Klein, G. Mahwah (Eds) Naturalistic
decision-making. New Jersey: Lawrence Erlbaum Associates Publishers.
Oren, T., and Ghasem-Aghaee, N. (2003). Personality Representation Processable in
Fuzzy Logic for Human Behavior Simulation. Summer Computer Simulation
Conference, July 20-24, 2003. Montreal, Canada. From
http://www.site.uottawa.ca/~oren/pres/pres-of-2003-01-SCSC-personality.pdf
Parasuraman, R., and Riley, V. (1997). Humans and automation: use, misuse, disuse,
abuse. Human Factors Vol 39, 230-253.
318
Chapter 12
List of References
Parasuraman, R., Bahri, T., Deaton, J., Morrison, J., and Barnes, M. (1990). Theory
and Design of Adaptive Automation in Aviation Systems. Technical Report No.
CSL-N90-1, Cognitive Science Laboratory. Catholic University of America,
Washington, DC.
Parasuraman, R., Mouloua, M., and Molloy, R. (1996). Effects of adaptive task
allocation on monitoring of automated systems. Human Factors. 38. pp. 665-679.
Parasuraman, R., Wickens, C. D., and Sheridan, T. (2000). A model for types and
levels of human interaction with automation. IEEE Transactions on Systems,
Man, and Cybernetics, 30(3), 286-297.
Park, J., Jung, W., Ha, J., and Shin, Y. (2004). Analysis of operators performance
under emergencies using a training simulator of the nuclear power plant.
Reliability Engineering and System Safety, 83, pp. 179-186.
Perrow, C. (1999). Normal Accidents. Princeton University Press.
Piantek, T.W. (1999). Influence in contracting and purchasing. In Safety Through
Design: Best Practices (EDS. Christensen, W.C., Manuele, F.A.). National Safety
Council Press.
PPrune Forums (2006). ATC Issues. From
http://www.pprune.org/forums/forumdisplay.php?s=ac64e2a0afd13472a93e7df2b
ba4b826&f=18
Rail Safety and Standards Board (2004). Rail-Specific HRA Tool for Driving Tasks
Phase 1 Report. From http://www.rssb.co.uk/pdf/reports/research/T270 Railspecific HRA tool for driving tasks Phase 1 report.pdf
Rasmussen, J. (1982). Human errors: A taxonomy for describing human malfunction in
industrial installations. Journal of Occupational Accidents, 4, 311-335.
Reason, J.T. (1997). Managing the risks of organizational accidents. Aldershot,
England: Ashgate Publishing.
Reid, J.W. (1996). Safety by Design. Lecture 4: Cost and acceptability of risk.
Hazardous forum: London.
Rigas, G. and Elg, F. (1997). Mental models, confidence, and performance in a
complex dynamic decision making environment. Department of Psychology,
Uppsala University, Sweden. From
http://www.ie.boun.edu.tr/labs/sesdyn/isdc97/TURKIA.doc
RISKS (2000). U.K. ATC System Failure. The RISKS Digest, Vol 20, issue 94. From
http://catless.ncl.ac.uk/Risks/20.94.html
Rizzo, A., Ferante, D., and Bagnara, S. (1995). Handling human error. In J.M. Hoc,
P.C. Cacciabue, & E. Hollnagel (Eds.), Expertise and Technology: Cognition &
Human-Computer Cooperation (pp. 195-212). Hillsdale, NJ: Lawrence Erlbaum.
Saldana, M. A. M., Herrero, S. G., del Campo, M. A. M. and Ritzel, D. O. (2002).
Assessing Definitions and Concepts within the Safety Profession. From
http://www.aahperd.org/iejhe/2003_first/ritzel.pdf.
Sampaio, J. J. M., and Guerra, A. A. (2004). The day god failed or overtrust in
automation: The Portuguese case study. In Proceedings from the 2nd
Conference on Human Performance Situation Awareness and Automation
(HPSAA 2). Daytona Beach, FL.
319
Chapter 12
List of References
320
Chapter 12
List of References
321
Chapter 12
List of References
Williams, J.C. (1986). HEART A Proposed Method for Assessing and Reducing
Human Error. In 9th Advances in Reliability Technology Symposium. University of
Bradford, 1986.
Wood, A. (1996). Software Reliability Growth Models. From
http://www.hpl.hp.com/techreports/tandem/TR-96.1.pdf
Zapf, D., and Reason, J.T. (1994). Introduction: Human Error and Error Handling.
Applied psychology: An international review, Vol 43(4), pp. 4127-432.
322
Appendices
Appendices
Appendix I
Appendix II
Appendix III
Appendix IV
Appendix V
Appendix VI
Appendix X
Appendix XI
323
Appendices
Table 1 ATC equipment as a cause of airport and enroute delays (personal correspondence )
Year
Enroute Delay
(min)
Airport Delay
(min)
Total Delay
(min)
1999
2000
2001
2002
2003
609265
598660
614534
425627
149476
461290
265055
406760
138045
147528
1070555
863715
1021294
563672
297004
There are a number of reasons for the differences in the delay reported by the CFMU
(Table 1) for a given period. Some global factors explaining the delay reductions in the
decade beginning in 2000, are the general reduction of air traffic (as a result of post
September 11th 2001 crisis in the aviation industry), the presence of severe factors
(e.g. closure of Yugoslav airspace in 1999), the introduction of new route structures in
1999, the influence of European ATM network programs (e.g. Reduced Vertical
1
324
Appendices
Table 2 ATC equipment as a cause of the US National Aviation System delays. From Bureau of
Transportation Statistics (2004), summaries available only for the whole 2004 and 2005
Year
2004
2005
402644
274126
Average cost
(millions $)
25.10
17.09
In general, these high-level analyses illustrate that equipment failures can significantly
affect operational, safety, and financial aspects of both ATC and ATM systems. Both
methods (employed for Europe and the US) for calculating the cost of the delay per
minute are largely similar. The only difference is the financial value assigned to each
minute of delay in Europe and the US. In addition, the true cost of equipment failure
induced delay should also incorporate technical repair, unscheduled maintenance,
training, and additional staffing. However, it is assumed that these costs represent only
a fraction when compared to the cost of delay per minute. Therefore, it can be
concluded that these estimates are a reasonable representation of the total cost
induced by ATC equipment failure both in the European and the US aviation markets.
325
Appendices
National Air Traffic Services (NATS), Corporate and Technical Centre (CTC)
and Swanwick Centre, UK;
EUROCONTROL
Maastricht
Upper
Area
Control
Centre
(MUAC),
Netherlands;
o
326
Appendices
Number of
participants
interviewed
one experienced
engineer
two experienced
engineers
Research
question
Finding
Agreement
between study
participants
Ambiguous
operational
failure reports
Proper
classification of
all operational
failure reports
EUROCONTROL
IANS
IAA
Number of
participants
interviewed
one ATM
specialist
one ATM
specialist
Research
question
Findings
Agreement
between study
participants
Usefulness of
announcing the
training for
unusual/emergen
cy situations
Although
controllers may
anticipate an
unusual occurrence
within their
emergency training,
this does not
facilitate better
performance as
long as they do not
know the nature of
that unusual
occurrence
Yes, both
agreed
327
Agreement
between study
participants
Yes,
experienced
latent software
failures
Yes
Yes
Appendices
Table A-4 Findings related to the contextual factors relevant to controller recovery from
equipment failures in ATC
Number of
Research
Agreement between
Location
participants
Finding
question
study participants
interviewed
Contextual
factors relevant
Agreed on selected
Validation of the
two ATM
to controller
contextual factors and
IAA
candidate
specialists
recovery from
aided the definition of
contextual factors
equipment
each factor
failures in ATC
Their feedback was
similar. Identified
Validation of
inconsistencies were
interactions
further clarified during the
Interactions
between contextual
three ATM
interview and were the
IAA
between
factors identified
specialists
result of the
contextual factors
using operational
misperception of some
experience and the
factors. All
past research
inconsistencies were
clarified.
328
Appendices
Failure of COMPAD
Loss of FDPS
Loss of MRP
Potential colour
coding in AideMemoire RED
Loss of ARTAS
Loss of SRP
Potential colour
coding in AideMemoire YELLOW
329
Appendices
o
Minimal impact not immediately critical but may have greater operational
impact over time. Relevant failures are:
o
Overload of SRP
Overload of MRP
Loss of STCA
Loss of APW
Loss of MSAW
Loss of OLDI
Potential colour
coding in AideMemoire GREEN
Note that the categorisation above lists some but not all possible failures. Those
marked in italics are designed in the Aide-Memoire format and are presented below.
Further input from system control and monitoring staff and ATM specialists may yield
more accurate and precise types of failures and recovery steps to be taken.
Design
At the top of each procedure, it would be useful to have the appearance of the pictorial
Human Machine interface (HMI) warning, if applicable (e.g. the highlighted labels on
the General Information Window). This would be followed by the presentation of the
two types of information. Firstly, the required recovery steps, i.e. those that a controller
must perform to recover effectively and ensure safe air traffic control service. Secondly,
the key effects of the equipment failure on the ATC system (i.e. the ATC system
feedback). The rational for this design solution is that the top part of the checklist
should be reserved for the items that controllers should be aware of first, i.e. recovery
steps.
In addition, it is necessary to define procedures for different personnel working in the
operational environment, namely controllers (i.e. different roles for executive, planner,
and assistant controller), supervisors, and managers to assure a seamless recovery
process. If, for example, radar services fail on all workstations, personnel should have
a readily available guide to help them recover from the failure. These guidelines may
vary according to the type of user, because different roles may require different
information on equipment failures and recovery procedures.
330
Appendices
Note that the colour-coded categorisation could be used in a slightly different manner
as well. If this Aide-Memoire becomes a part of the generic procedures for handling
emergency/unusual situations than the use of colour should be restricted to categories
such as Aircraft Emergencies, Equipment Failures, Fire and Building Evacuation.
The Aide-Memoire, as a hard, laminated copy flip chart, should be readily available on
each Controller Working Position (CWP). A more detailed version, providing local or
ATC Centre specific data, should be at the supervisors position. For simplicity and
efficiency, it is better to present each relevant failure on a single page highlighting the
two main areas: what recovery steps to perform and what feedback to expect from the
ATC system. This approach assures the most efficient usage of the tool.
The final version of the Aide-Memoire should not be considered as an exhaustive list
but more of a living document. In other words, it will be necessary to update this tool on
annual basis to reflect the local expertise and to compile all changes (i.e. changes in
the ATC system, both software and hardware).
331
Appendices
ATCO actions:
Inform Coordinator
Inform all traffic
Check spare ODS
Maintain timely & accurate strip marking
Restrict traffic
Utilise holding patterns
Use only verbal coordination channels
Reaffirm traffic identification using the code on the FPS
Identify any new tracks using the Confirm Squawk?
method
Seek SAS assistance and print screen if possible
Ground all sport/non-commercial traffic ASAP
Utilise strategic ATC techniques when possible
Conduct regular checks of aircraft identification
Monitor Mode C closely
Be aware of the absence of Safety Nets and Monitoring
Aids
Cross check that exit conditions are achieved
Expedite reduction in traffic load
332
Appendices
333
Appendices
Inform Coordinator
Inform all traffic
Employ procedural control techniques (if necessary
utilise emergency vertical separation of 500 feet)
Utilise holding patterns
Deny departures
Maintain timely & accurate strip marking
Instruct aircraft to maintain VMC, if in VMC
Reduce traffic load ASAP
Seek assistance
Relocate to contingency site if required
Expect
All ODS frozen or blanked throughout the Centre
334
Appendices
Failure of COMPAD
ATCO actions:
Inform Coordinator
Transmit on second sector COMPAD
Access RBS and inform traffic of failure
Reset COMPAD
Seek assistance and relocate to spare CWP
Inform traffic of restoration of normal service when
service is restored
Expect:
Complete or Partial failure
Inability to transmit on RTF
Inability to access alternate RTF
Inability to use intercoms
Inability to access telephone network
335
Appendices
ATCO actions:
Inform Coordinator
Report failure
Operate as normal
Expect:
All functions are available
The switch to RFS (MRTS) from ARTAS is automatic
Any position in by-pass before ARTAS failure will remain
in by-pass
336
Appendices
ATCO actions:
Inform Coordinator
Be aware of restricted, danger and prohibited
airspace inc. TSAs
Expect:
Any alert displayed prior to the reduced alert mode will
remain displayed regardless of whether or not the alert is
still valid.
The following functions are NOT AVAILABLE:
337
Appendices
ATCO actions:
Inform Coordinator
Check availability of FDP function on spare ODS
Inform traffic of failure
Maintain timely & accurate strip marking
Use verbal coordination channels inter sector/ centre
Identify all new tracks using the Confirm Squawk
technique
Maintain identification by regular checks
Restrict traffic flow where necessary
Utilise holding patterns
Be aware of unreliable Safety Nets and Monitoring Aids
Seek SAS assistance where necessary
Expect:
The following functions are NOT AVAILABLE:
338
Appendices
ATCO actions:
Inform Coordinator
Use only verbal inter-centre coordination channels
Inform all traffic on RTF
Seek FDA assistance for AFTN or AIS information
Maintain timely & accurate strip marking
Seek SAS assistance where necessary
Expect:
The following functions are NOT AVAILABLE:
339
Appendices
ATCO actions:
Inform Coordinator
Select radar by-pass services
Expect:
No radar data function (neither ARTAS nor MRTS nor RFS)
340
Dear Sir/Madam,
This questionnaire is created for the purpose of obtaining information on equipment
failures and recovery in Air Traffic Control (ATC) System(s) from various standpoints.
The information you provide will be used in a research project jointly supported by
EUROCONTROL Experimental Centre and Imperial College London.
We would greatly appreciate your completing of the attached questionnaire. It will only
take a few minutes of your time to answer the questions which will contribute to our
joined effort to introduce more real experience into ATC safety analysis. Data collection
intends to support recovery strategies of future ATM and analyse the current status on
this issue. The information that you provide will be used as additional data source for
the PhD dissertation developing in this area.
The questionnaire is created in Microsoft Word 2000. It is our intention to enable you
to fill it out electronically and directly send it directly to the following e-mail address
(branka.subotic@ic.ac.uk). However, if it is more convenient you can use the fax
number provided below.
Generally there are two formats of the questions, which require different way of
answering. For some questions you will have to choose the most appropriate answer
by highlighting it, marking it (e.g. yes/no answers), while for the others you will have to
type in your full answer.
Please, fill out your questionnaire and try to answer the questions as detailed as
possible. Your answers will be strictly confidential and de-identified, thus your personal
details will not appear in any document connected to this research.
Thank you in advance for your time and effort.
Sincerely,
Branka Subotic
Research PhD student
Imperial College London
Centre for Transport Studies
London SW7 2AZ
Phone +44 (0)2075946 022
Fax
+44 (0) 2075946 102
branka.subotic@ic.ac.uk
341
Appendices
Location
Number of years
worked in particular
Unit
Country
Type (Civilian/
Military)
Position/Rating
ACC/RDR, ACC/PROC,
APP/RDR, APP/PROC, TWR
or
ARTCC, TRACON, ATCT (USA)
3.
Have you ever experienced ATC equipment failure during your work? Mark the corresponding letter.
(If No go to question 10)
4.
What is the average number of ATC equipment failures during one year that you experience? _________________________
342
Appendices
5. Please fill in any previous experience with equipment failures which seriously impacted your work:
Type of
equipment
failure
System
affected?
(See Note
below)
Frequency of
the failure per
year
(in your own
experience)?
Did you
detect it
and how?
If not,
who
detected
it?
Duration of
the failure
min, h, days
(If you can
recall)?
Recovery/
contingency
procedure
existed or
not?
Recovery/
contingenc
y training
existed or
not?
Who
initiated
the
recovery?
How was
the
recovery
initiated?
* Page: 343
Context is defined as any aspect of the operating context that influenced the failure or recovery aspect (e.g. workload, HMI, personal factors, team factors).
Note:
The typical CWP (controller working position) contains one or more of the following systems (systems will vary from one center and country to another):
Radar (SSR, PRS, Mode S, radar data processing (RDP), multi-radar processing (MRP), single radar processing (SRP))
Ancillary screens (meteorological information, strip bay, traffic flow information, etc.)
o Flight Plan Processing (FPP)
o Flight Progress Strips (FPS)
Pointing devices (mouse & trackball)
Secondary input devices (keyboard or touch input device (TID))
343
Any
additional
comment
Appendices
Communication panel
R/T, telephone, headset, intercom
Strip printer
Ground based Safety Nets (SNET): STCA, MSAW, APW, or any other SNET available
Other (e.g. power supply)
6. How much do you generally rely upon the written procedures in case of equipment failure and how much on situation-specific
problem solving (i.e. improvisation)? Fill in the corresponding number for Procedures, Problem solving, AND Other.
1 (very much)
3 (moderately)
5 (not at all)
Written procedures
Situation-specific problem
solving
Other (e.g. past experience)
7.
Is there any organized exchange of the past experience in solving the equipment failures with your fellow
colleagues?
8.
9.
According to your experience, what are the three most unreliable ATC systems/subsystems? Please use the device listing
from the Note above to state those systems starting with the most unreliable one:
(Note: Reliability is defined in this questionnaire as the probability that a piece of equipment or component will perform its intended
function without failure over the given time period and under specific or assumed conditions)
344
Appendices
Following questions should be answered in relation to your current job, position, and level of experience (the first one cited in the
question 2).
Procedures
10.
11.
Which types of equipment failures (outages) are covered by procedures in your Center?
12.
13.
14.
15.
16.
17.
18.
19.
345
Appendices
20.
Describe the situation when you had a problem applying the recovery/contingency procedure and why?
Training
21. Is training provided in recovery from equipment failures?
24. Is it enough?
346
Appendices
Conclusion
29. Please write down any other comments or suggestions based on your past experience or professional opinion that you might
have on the issue of equipment failures, recovery/contingency procedures, or training.
Thank you for taking the time to answer these questions. Your time and participation are greatly appreciated.
--End--
347
Appendices
348
Appendices
349
Appendices
350
Appendices
351
Appendices
352
Appendices
353
Appendices
354
Appendices
system, or radar). These findings are expected as NATS (2002) reports that most
failures do not affect the controllers as these are prevented or recovered by system
control and monitoring unit. Moreover, the results obtained from this questionnaire
survey emphasise that the prompt detection of any ATC system deficiency depends
mostly on the controller, as a direct result of the controllers situational awareness.
Furthermore, the results show that failure detection may be aided by system-generated
failure alerts. This is an example of the synergy that exists between technical and
controller recovery achieved through the technical built-in defences for transmitting
information on failure (discussed in Chapter 4, section 4.3.2). These technical systems
will demonstrate more potential in the future, highly integrated ATC environment.
3. Duration of the equipment failure
Similar to the frequency variable, it was not possible to extract the duration of failures in
27.20 percent of examples. This was expected due to the difficulties with recalling the
duration of past failures. Additional problems were encountered with vague qualitative
responses (e.g. several days, a couple of hours, a few minutes). The available and preprocessed data show that the average duration of the reported failures was close to
one day, ranging from five minutes to one month. The large dispersion indicates
different durations for different types of failures.
The same categorisation of duration variables is applied as previously with the
operational failure reports (see Chapter 4, section 4.4.6). More precisely, the
categorisation focused on failures up to 15 minutes, between 15 minutes and one hour,
between one hour and one day, and those lasting more than one day. It is interesting to
note that distribution of duration from operational failure reports and from past
experience captured in this survey show similarities (Figure 1). The difference is
observed in the third category (duration from one hour to one day). It seems that in the
operational environment, equipment failures of this duration tend to occur more
frequently compared to the experience of controllers worldwide.
355
Appendices
100
Frequency
80
60
42.55%
40
31.06%
19.15%
20
7.23%
0
[0.00-0.25]
[0.26-1.00]
[1.01-24.00]
(>24.01]
a)
3,000
2,500
Frequency
2,000
1,500
34.51%
31.6%
25.85%
1,000
500
8.04%
0
[0.00-0.25]
[0.26-1]
[1.01-24]
[>24.01]
b)
Figure 1 Distribution of the duration variable a) from the questionnaire survey; b) from the
Country D operational failure reports (see Chapter 4)
356
Appendices
there were several instances in which controllers rated context as positive mostly
through efficient teamwork, availability of an efficient assistant, low traffic levels at the
time of occurrence (i.e. no significant increase in workload), and ability to work with
fallback systems. As a result, the importance of context identified in past research is
confirmed in this questionnaire survey. The following Chapters are dedicated to further
assessment of recovery context.
5. Existence of training for recovery for a particular failure
Question 5 allowed mapping between ATC functionalities and available recovery
training for the sampled equipment failures1. The analysis showed that in 48 percent of
examples provided, the controllers had some type of recovery training. This training
was mostly provided for the communication, navigation, surveillance, and data
processing functions. Lack of training is identified for power outages and loss of safety
nets.
6. Individual who initiated the recovery and method applied
The individuals that initiated and applied recovery processes came predominately from
the controller population when compared with watch managers and engineers. This is
understandable as section 2 pointed out that most equipment failures are detected by
controllers. Having detected a problem with equipment, the controllers have to inform
engineers, indirectly through the watch manager, which constitutes the initiation of the
recovery. In some simple cases (e.g. loss of microphone and loss of screen), the
controller tries to replace the failed equipment either by using the spare one or by
changing to another working position (if there are any spare ones). In more complex
situations, when a change of position is not possible, the controller has to continue
working with the remaining tools and equipment and potentially revert to procedural
control, assure vertical separation, use fallback systems, and/or transfer all flights to an
adjacent sector or flight information region. Engineers initiate the recovery process in
the case of failures of aeronautical data exchange with adjacent ATC Centres,
runway/taxiway lighting systems, and data processing system. However, the controller
still remains responsible for safe separation of all traffic in the affected airspace.
Question 26 although intended to capture the type of recovery training missing in each
sampled ATC Centre yielded mostly high-level comments on impossibility to train for every
potential equipment failure.
357
Appendices
7. Concluding remarks
In general, the controllers perceive equipment failures as stressful and distracting
events that pose a major safety problem due to increased workload and difficulties with
maintaining identification of aircraft (e.g. in case of radar failure and data processing
failure). In one particular instance a controller commented that an equipment failure led
to a near miss. Another example pointed out the problems with equipment failures
occurring during night shift, as technical staff are not always available during that
period.
358
Appendices
Factor
HERA
Eurocontrol
HERA [12]
Pilot-controller
comm.
Pilot actions
Traffic and
airspace
Weather
TRACEr
Shorock and
Kirwan [19]
RAFT
Eurocontrol
[20]
External PSF
Pilot-controller
comm.
Pilot-controller
comm.
Complexity;
Requirements for
perception;
requirements for
motor speed
Documentation
and procedures
Procedures
Procedures and
documentation
Required
procedures; Workmethods; Plant
policy
Training and
experience
Training and
experience
Training and
experience
Workplace design
and HMI
Workplace design,
HMI, and equipment
factors
Environment
Prior training,
experience
Ambient
environment
Quality of
environment; T; Air
quality; Situational
factors
Detractors; Extreme
T; radiation;
Pressure;
Inadequate oxygen
supply; Vibration;
Restricted
movements
Perception; Motor
system; Memory;
Decision-making;
Short-term and longterm memory
Duration of stress;
Pain; Thirst; Fatigue;
Threats; Monotony;
Work performance;
Circadian rhythm
Personal factors
Personal factors
10
Team factors
Organisational
factors
Other
organisational
factors, Logistical
factors
Internal
PSF
Organisational
structure; Working
hours; Actions by
shift leader,
manager;
Remuneration
structure
12
Suddenness of
occurrence
13
14
359
COCOM
Hollnagel
[27]
CREAM
Hollnagel [11]
Inconsistent
labelling
Human machine
interaction
Personal factors
Organisational
factors
Stressors
Design features;
Factors in task and
work resources;
Warnings and
danger signs; Manmachine factors;
Interface
11
THERP
Swain and Guttman [24]
Plans
Availability of
procedures/ plans
Normal/familiar
process state
Adequacy of training
and experience
MMI and
support
Adequacy of MMI
and operational
support
Working conditions
State of
momentarily
abilities
personality and
intelligence;
motivation and
attitudes;
emotional state;
stress; gender
Attitudes
deriving from
family or
groups; group
dynamic
processes
Crew collaboration
quality
Adequate
organisation
Adequacy of
organisation
Few
simultaneous
goals
Number of
simultaneous goals
Available time
Available time
Appendices
Factor
HRMS
Kirwan [28]
Recovery from
Failures
Kanse and van der
Schaaf [21]
CORE-DATA
Eurocontrol
[13]
ATHEANA
U.S. NRC
[29]
CAHR
Straeter [16]
NARA
Kirwan et al.
[30]
HPDB
Park et al.
[32]
Communication
Procedures
Clarity/Precision of
procedures; Design of
procedures; Content;
Completeness; Presence
Dependencies
of the different
tasks/steps/acti
ons
Procedures
Training/expertise/expe
rience/competence
Quality of information/
interface
10
Task organisation
11
Inexperience
Shortfalls in the
quality of
information
conveyed by
procedures; use of
more dangerous
procedures
Operator
inexperience;
Unfamiliarity
(situation occurs
infrequently)
Unfamiliar plant
conditions
Usability of control;
Usability of equipment;
Positioning; Equivocation
of equipment ;
arrangement of
equipment; display
range; accuracy of
display; Labelling;
Marking; Reliability;
Technical layout;
Construction;
Redundancy; Coupled
equipment
Technical/workplace/situati
onal factors
Environmental
factors and
ergonomics
External event
Poor environment
Human
performance
capabilities at low
point; Excessive
workload
Technical/workplace/situati
onal factors
Stress; Workload
Processing; Information;
Goal reduction
Social factors
Organisational factors
Lack of
supervision/checks
Non-optimal use
of human
resources
Operator under
load/boredom; A
conflict between
intermediate and
long-term
objectives; Stress
and ill-health;
Information overload
Poor handovers and
team coordination
problems
Low workforce
moral or adverse
organisational
environment
Available
procedure &
description of
all steps and
tasks
Level of
experience
Person issues;
Demand of
perception,
cognition, etc.
Team issues
12
13
14
Time
Time pressure
Time constraints
Occurrence-related factors
360
Time pressure
Time pressure
The time
needed to
correctly
perform tasks,
steps, and
actions
Appendices
The relevant Recovery Influencing Factors (RIFs) are discussed in the four main
groups: internal factors (i.e. related to the controller), equipment failure related factors,
external factors (i.e. factors related to working conditions), and airspace related factors.
The following paragraphs present the underlying considerations in developing the
probability values for each predefined RIF.
RIF
Training for
recovery
from ATC
equipment
failure
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number of
responses
suitable
tolerable
counter
productive
The
questionnaire
survey
134
361
Percentage
of
responses
RIF
probability
52
0.52
17
0.17
31
0.31
Nature of
the
validation
Appendices
RIF
Qualitative
descriptor
Previous
experience
with
equipment
failures
experienced
any type of
equipment
failure
no
experience
with
equipment
failures
Data source
for
probabilistic
assessment
The
questionnaire
survey
Number of
responses
Percentage
of
responses
RIF
probability
95
0.95
Nature of
the
validation
ATM
specialists
surveyed
134
5
0.05
362
Appendices
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Experience
with system
performance
(reliance or
trust in the
system)
objective
attitude
toward the
ATC
system
excessive
trust and
mistrust
Past
research
and ATM
specialists
Number of
responses
Percentage
of
responses
RIF
probability
72
0.72
79/8
Nature of
the
validation
28
0.28
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number of
responses
suitable
Personal
factors
tolerable
counter
productive
ATM
specialists
Percentage
of
responses
RIF
probability
65
0.65
26
0.26
0.09
Nature of
the
validation
363
Appendices
RIF
Communication
for recovery
within
team/ATC
Centre
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number
of
responses
efficient
tolerable
ATM
specialists
inefficient
Percentage
of
responses
RIF
probability
73
0.73
24
0.24
0.04
Nature of
the
validation
TRM represents an effective use of all available resources for ATC personnel to assure safe
and efficient operation, to reduce error, avoid stress, and increase efficiency.
364
Appendices
who stated that the majority of ATC equipment failures represent single as opposed to
multiple failure occurrence (for evidence see Appendix II).
Table 6 Summary of the RIF Complexity of failure type
RIF
Qualitative
descriptor
Data source
for probabilistic
assessment
Number
of
response
s
Percentag
e of
responses
RIF
probab
ility
92
0.92
0.08
a single
failure
Complexity of
failure type
multiple
failure
Operational
failure reports
22,808
reports
Nature of
the
validation
ATM
specialists
responses
and system
control and
monitoring
engineers
RIF
Time course
of failure
development
Qualitative
descriptor
sudden
gradual
latent
Data source
for
probabilistic
assessment
ATM
specialists
responses
Number of
responses
Percentage
of
responses
RIF
probability
55
0.55
39
0.39
0.07
Nature of
the
validation
System
control and
monitoring
engineers
365
Appendices
Controller Working Positions (CWPs) and sectors. Due to the lack of operational data,
a conservative approach is taken and probabilities are equally assigned between two
levels. Note that this RIF has no Level 1, i.e. the most favourable level, simply because
the number of workstations/sectors affected cannot have any positive or favourable
effect on controller performance (Table 8).
Table 8 Summary of the RIF Number of workstations/sectors affected
Data source
for
probabilistic
assessment
RIF
Qualitative
descriptor
Number of
workstations/
sectors
affected
one CWP or
several CWPs in a
sector
several CWPs in
several sectors/all
CWPs in all sectors
Number
of
responses
Percentage
of
responses
RIF
probability
50
0.5
Nature of
the
validation
N/A
50
0.5
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Time
necessary
to recover
less than
time
3
available
in excess
of time
available
ATM
specialists
Number of
responses
Percentage
of
responses
RIF
probability
94
0.94
Nature of
the
validation
0.06
Time available to controller to react before the development of less than adequate separation.
366
Appendices
RIF
Existence of
recovery
procedure
Data source
for
probabilistic
assessment
Qualitative
descriptor
suitable
tolerable
inappropriate
Number
of
responses
The
questionnaire
survey
Percentage
of
responses
RIF
probability
47
0.47
39
0.39
14
0.14
134
Nature of
the
validation
367
Appendices
RIF
Qualitative descriptor
Data
source for
probabilistic
assessment
Operational
failure
reports
moderate to
substantial period of
time (failures longer
than 15 minutes)
Number
of
responses
Percentage
of
responses
RIF
probability
56
0.56
22,808
(reports)
44
0.44
Nature of
the
validation
ATM
specialists
surveyed
RIF
Adequacy of
HMI and
operational
support
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number
of
responses
suitable
tolerable
counter
productive
ATM
specialists
Percentage
of
responses
RIF
probability
53
0.53
45
0.45
0.03
Nature of
the
validation
368
Appendices
than one way. In general, it is observed that a lack of transparency of an ATC system
leads people to make hypotheses on the causes of failures based on incomplete
information or best guess (see Straeter, 2005). ATC subsystems are highly dependent
on each other. Information from one tool can be distributed to several different
subsystems at the same time. For example, information on aircraft position is sent
directly to the radar data processing system, air traffic flow management, ATC tools
(including the monitoring aid and the medium term conflict detection tool), safety nets
(e.g. the short term conflict alert tool), and flight data processing system. In other
words, ATC systems are closely coupled and dependant upon dynamic information
exchange. For this reason the architecture of any ATC Centre takes into account
existing interactions by building a net of redundancies. In addition, any symptoms that
can be interpreted in more than one way will be interpreted wrongly in some instances.
Based on the above discussion, the qualitative descriptor are set at two levels whilst
the corresponding probabilities are determined from the average of the responses from
the ATM specialists surveyed (Table 13).
Table 13 Summary of the RIF Ambiguity of information in the working environment
RIF
Qualitative
descriptor
Ambiguity of
information in
the working
environment
Data
source for
probabilistic
assessment
Number
of
responses
ATM
specialists
Percentage
of
responses
RIF
probability
86
0.86
Nature of
the
validation
14
0.14
369
Appendices
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number
of
responses
suitable
Adequacy of
alarms/alerts
tolerable
counter
productive
ATM
specialists
Percentage
of
responses
RIF
probability
75
0.75
20
0.2
0.05
Nature of
the
validation
370
Appendices
RIF
Qualitative descriptor
Adequacy
of
alarm/alert
onset
Number
of
responses
N/A
Percentage
of
responses
RIF
probability
50
0.50
Nature of
the
validation
N/A
50
0.50
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number
of
responses
efficient
Adequacy of
organisation
tolerable
ATM
specialists
inefficient
Percentage
of
responses
RIF
probability
67
0.67
31
0.31
0.03
Nature of
the
validation
371
Appendices
RIF
Traffic
complexity
during the
recovery
process
Data
source for
probabilistic
assessment
Number
of
responses
ATM
specialists
Percentage
of
responses
RIF
probability
19
0.19
Nature of
the
validation
81
0.81
372
Appendices
frequent changes in heading compared to enroute airspace and especially its higher
levels. Due to differences in controller tasks, en-route airspace in general provides
more time to recover compared to terminal airspace. In addition, interviews with ATM
specialists revealed that terminal airspaces have radar coverage provided from one
radar source compared to en-route airspace, which is usually based on multi-radar
tracking (i.e. integration of data from several radar sites). The qualitative descriptor is
set at three levels whilst the corresponding probabilities are determined from the
average of the responses from the ATM specialists surveyed (Table 18).
Table 18 Summary of the RIF Airspace characteristics during the recovery process
RIF
Airspace
characteristics
during the
recovery
process
Qualitative
descriptor
Data
source for
probabilistic
assessment
Number
of
responses
Adequate
Tolerable
ATM
specialists
Inappropriate
Percentage
of
responses
RIF
probability
64
0.64
33
0.33
0.03
Nature of
the
validation
RIF
Weather
conditions
during the
recovery
process
Improved
Data
source for
probabilistic
assessment
Number
of
responses
ATM
specialists
RIF
probability
89
0.89
11
issues
during
Nature of
the
validation
Deteriorated
A.4.4 Conflicting
complexity)
Percentage
of
responses
the
recovery
0.11
process
(task
This dynamic factor describes the level of overall task complexity at the moment of
equipment failure. In the case of multiple conflicting tasks, the operator has to prioritise
between them (Straeter, 2005). In the case of any type of conflict alert (i.e. two or more
aircraft having a conflicting intent), the controller has to provide full attention to the
373
Appendices
resolution of the conflict using the equipment which is still operational, but assuming
that some other subsystem might fail. In ATC overall safety is the first priority. Due to
the dynamic nature of ATC, this qualitative descriptor is proposed at two levels, the
average complexity of the situation and both high and low complexity of the situation
(as both have negative effect on controller performance: increased workload and
boredom or monotony, respectively). The corresponding probabilities are determined
from the responses from the ATM specialists surveyed (Table 20).
Table 20 Summary of the RIF Conflicting issues during the recovery process (overall task
complexity)
Data source
Number
Percentage
Nature of
Qualitative
for
RIF
RIF
of
of
the
descriptor
probabilistic
probability
responses responses
validation
assessment
Conflicting
issues during
the recovery
process
The average
complexity
Multiple tasks
and low
complexity
ATM
specialists
72
0.72
28
0.28
374
Appendices
Note: The set of questions presented below is investigating controller recovery from
equipment failures in ATC. All questions should be answered based upon your
operational experience and knowledge. Whilst some of them are very specific, and
therefore pose a challenge to answer, please try to respond to all the questions giving
the appropriate percentages.
What is the percentage of ATCOs that have never experienced equipment failure in
their career? Please think of novice ATCOs as well and try to make the best estimation.
Efficient
Tolerable
Inefficient
100%
375
Appendices
100%
How often has the time
necessary to recover (time
before the development of any
inadequate separation) been:
Adequate
Inadequate
100%
Up to 15min
More than 15min
100%
According to your opinion, what is the percentage of match between the controller's
situational awareness and the dynamic airspace and traffic configuration (traffic mix,
speed differentials, FL utilized, airways configuration) during the recovery process?
Efficient
Tolerable
Inefficient regarding the support for better recovery
from equipment failures.
100%
Too high
Tolerable
Too low
100%
Adequate
Tolerable
Inappropriate
100%
Improved
Deteriorated or worsen
Unchanged
100%
376
Appendices
High
Average
Low
100%
377
Appendices
ID
Internal factors
(2)
RIF name
Training for
recovery from ATC
equipment failure
Previous experience
with equipment
failures
Personal factors
Communication for
recovery within
team/ATC Centre
Complexity of failure
type
(3)
Descriptor
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
Experienced with a
particular type of
failure or
Experienced with
any other type of
ATC equipment
failure
No experience with
ATC equipment
failures
Objective attitude
toward the system
Positive experience
with the system
(excessive trust) or
Negative
experience with the
system (undertrust)
Suitable for the
recovery process
Tolerable for the
recovery process
Counter productive
for the recovery
process
Time course of
failure development
Number of
workstations/sectors
affected
(5)
Probability
(p)
Expected
effect of
controller
recovery
performance
(7)
(8)
Level
Designator
(R)
Probability
of overall
situation
occurring
(p*R)
0.52
Most
favourable
0.52
0.17
Non
significant
0.00
0.31
Least
favourable
-1
-0.31
0.95
Most
favourable
0.95
0.05
Non
significant
0.00
0.72
Non
significant
0.00
0.28
Least
favourable
-1
-0.28
0.65
0.00
-1
-0.09
0.73
0.65
0.26
0.09
0.00
-1
-0.04
0.00
-1
-0.08
0.55
Improve
0.55
0.07
Non
significant
0.00
0.39
Least
favourable
-1
-0.39
0.50
Non
significant
0.00
0.50
Least
favourable
-1
-0.50
Tolerable
0.24
Inefficient
0.04
Persistent or latent
failure
Gradual
degradation of
system
One
workstation/one
sector or All
workstations in one
sector
Several
workstations/couple
of sectors or All
Least
favourable
0.73
Single system
affected
Multiple systems
affected
Most
favourable
Non
significant
(6)
Most
favourable
Non
significant
Least
favourable
Non
significant
Least
favourable
Efficient
Sudden failure
7
(4)
0.92
0.08
378
Appendices
10
11
12
13
14
15
Airspace
related
factors
16
17
Time necessary to
recover
Existence of
recovery procedure
Duration of failure
Adequacy of HMI
and operational
support
Ambiguity of
information in the
working
environment
Adequacy of
alarms/alerts
Adequacy of
alarm/alert onset
Adequacy of
organisation
Traffic complexity
workstations/all
sectors
Adequate - less
than available time
Inadequate - in
excess of available
time
Suitable to the
situation in
question
Tolerable to the
situation in
question
0.94
Most
favourable
0.94
0.06
Least
favourable
-1
-0.06
0.47
Most
favourable
0.47
0.39
Non
significant
0.00
-1
-0.14
0.00
Inappropriate
0.14
0.56
Moderate period of
time or Substantial
period of time
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
External working
environment
matches the
controller's internal
mental model
External working
environment
mismatches the
controller's internal
mental model
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
Information from
the external world
enters the
processing loop at
the right time
Information from
the external world
enters the
processing loop at
the wrong time
(misleading
sequence of
alarms)
0.44
Least
favourable
-1
-0.44
0.53
Most
favourable
0.53
0.45
Non
significant
0.00
0.03
Least
favourable
-1
-0.03
0.86
Most
favourable
0.86
0.14
Least
favourable
-1
-0.14
0.75
Most
favourable
0.75
0.20
Non
significant
0.00
0.05
Least
favourable
-1
-0.05
0.50
Most
favourable
0.50
0.50
Least
favourable
-1
-0.50
Efficient
0.67
Tolerable
0.31
Inefficient
0.03
Average traffic
complexity
Extremely high or
extremely low
traffic complexity
Least
favourable
Non
significant
0.81
0.19
379
Most
favourable
Non
significant
Least
favourable
Non
significant
Least
favourable
0.67
0.00
-1
-0.03
0.00
-1
-0.19
Appendices
18
19
20
Airspace
characteristics
Weather conditions
during the recovery
process
Conflicting issues in
the situation (task
complexity)
Adequate (e.g.
enroute higher
levels)
0.64
Most
favourable
0.64
Tolerable
0.33
Non
significant
0.00
Inappropriate (e.g.
enroute lower
levels or terminal)
0.03
Least
favourable
-1
-0.03
Improved
0.89
0.00
Deteriorated
0.11
-1
-0.11
0.00
-1
-0.28
Average complexity
of the situation
Conflicting, multiple
tasks or Extremely
low complexity of
the situation (may
lead to monotony)
0.72
0.28
380
Non
significant
Least
favourable
Non
significant
Least
favourable
Appendices
Weather
Task complexity
Duration of failure
Airspace characteristics
Adequacy of organization
Adequacy of alarms/alerts
Ambiguity of information
Complexity of failure
Personal factors
x
x
Duration of
failure
Adequacy of
HMI and
operational
support
Ambiguity of
information in
the working
environment
Adequacy of
alarms/alerts
Adequacy of
alarms/alerts
onset
Adequacy of
organization
Traffic/traffic complexity
Training for
recovery from
ATC
equipment
failures
Previous
experience
with equip.
failures
Experience
with system
performance
(reliance)
Personal
factors
Comm. for
recovery
within a team
of controllers
Complexity of
failure type
Time course
of failure
development
Number of
workstations/
sectors
affected
Time
necessary to
recover
Existence of
recovery
procedure
DIRECT
INFLUENCE
x
x
x
x
381
x
x
x
x
Appendices
Traffic/traffic
complexity in
the moment of
failure
Airspace
characteristics
Weather
conditions
during the
recovery
process
Task
complexity
NOTE:
Please mark the interactions between each factor in the upper row and each factor
from the left column. For example, does 'Training for recovery' influences any of the
factors from the left side ('previous experience', 'experience with the system', 'personal
factors', and so on). Please add or delete existing interactions as you find it
appropriate.
382
Appendices
RIF1
RIF2
RIF3
RIF4
RIF5
RIF6
RIF7
RIF8
RIF9
RIF10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4
0
0
0
0
0
0
0
0
2239488
8957952
2239488
0
0
0
0
0
0
0
2239488
8957952
2239488
0
0
0
0
0
0
0
2239488
8957952
2239488
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
13436928
6718464
0
0
0
0
0
0
0
0
13436928
6718464
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
248832
3483648
8709120
3981312
3483648
248832
0
0
0
0
248832
3483648
8709120
3981312
3483648
248832
0
0
0
0
0
0
0
0
0
0
0
168
5964
67956
379116
1227984
2513604
1653636
3393708
2513604
1227984
379284
73920
73920
379284
1227984
2513604
1653636
3393708
2513604
1227984
379284
73920
73920
379284
1227984
2513604
1653636
3393708
2513604
1227984
379116
67956
5964
168
0
0
0
0
0
0
24
2244
37908
266508
1008576
2310156
1621692
3512088
2750052
1398444
442464
82008
44760
266688
1008576
2310156
1621692
3512088
2750052
1398444
442464
82008
44760
266688
1008576
2310156
1621692
3512088
2750052
1398444
442440
79764
6852
180
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
13436928
6718464
0
0
0
0
0
0
0
0
13436928
6718464
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
10077696
6718464
3359232
0
0
0
0
0
0
0
10077696
6718464
3359232
0
0
0
0
0
0
0
0
0
0
0
0
96
4272
58656
383184
1422000
3279840
2337228
5184840
4234404
2283432
786768
162216
17670
780
6
0
0
0
0
0
96
4272
58656
383184
1422000
3279840
2337228
5184840
4234404
2283432
786768
162216
17670
780
6
0
0
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
0
383
Appendices
Level
RIF11
RIF12
RIF13
RIF14
RIF15
RIF16
RIF17
RIF18
RIF19
RIF20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
10077696
6718464
3359232
0
0
0
0
0
0
0
10077696
6718464
3359232
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
248832
2488320
5474304
2488320
2488320
248832
0
0
0
0
248832
2488320
5474304
2488320
2488320
248832
0
0
0
0
248832
2488320
5474304
2488320
2488320
248832
0
0
0
0
0
0
0
0
0
0
0
0
0
20736
684288
3836160
7527168
3545856
3836160
684288
20736
0
0
0
0
0
0
0
0
0
0
0
0
20736
684288
3836160
7527168
3545856
3836160
684288
20736
0
0
0
0
0
0
0
0
0
0
0
0
0
0
746496
5971968
3732480
2985984
0
0
0
0
0
0
746496
5971968
3732480
2985984
0
0
0
0
0
0
746496
5971968
3732480
2985984
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
124416
2363904
7589376
4354560
4976640
746496
0
0
0
0
0
0
0
0
0
0
0
0
0
0
124416
2363904
7589376
4354560
4976640
746496
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1492992
5225472
2985984
3359232
373248
0
0
0
0
0
1492992
5225472
2985984
3359232
373248
0
0
0
0
0
1492992
5225472
2985984
3359232
373248
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1119744
8957952
5598720
4478976
0
0
0
0
0
0
1119744
8957952
5598720
4478976
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6718464
4478976
2239488
0
0
0
0
0
0
0
6718464
4478976
2239488
0
0
0
0
0
0
0
6718464
4478976
2239488
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20155392
0
0
0
0
0
0
0
0
0
20155392
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
696
14778
131736
638880
1903896
3719892
2405976
4929648
3719892
1903902
639576
146514
146514
639576
1903902
3719892
2405976
4929648
3719892
1903896
638880
131736
14778
696
6
0
384
Appendices
385
Appendices
Researcher:
Branka Subotic
Supervisor:
Dr Washington Y. Ochieng
University:
Location of experiment:
XXX
June 2006
386
Appendices
SUBJECT INSTRUCTIONS
Your understanding and help are crucial at every step of this study!
This study is designed as an integrated part of regular emergency training in Dublin ATC Centre
with the minimal impact on the controller. Therefore, please consider and treat this training
session as any other training session you have had in your professional career.
From time to time, additional information may be given to you from the training instructor or
researcher. In these occasions please act as if you would in the operational environment. Also,
when information or instructions is given to you by the researcher, please regard it as if it comes
from a training instructor.
Now, we would like you to read the Consent form which aims to inform you what the
experiment involves and to make you fully aware of your rights while you are taking part in it. So
please proceed to the next page, read the form, and sign it if you agree with all terms and
conditions. If you have any questions, please do not hesitate to contact the researcher.
In addition, we will ask you to fill out a questionnaire and participate in a de-briefing after the
training session. The De-briefing part of this experiment is of high importance as we will
compare the recorded data with your own experience and decision-making process. Therefore,
we would like to encourage you to give the researcher detailed input and explanation.
387
Appendices
The purpose of this research is to investigate the controllers decision making process. You will
be asked to complete one emergency training session and therefore perform air traffic control
service through one traffic scenario. The entire experiment is expected to take approximately
1.5h to complete.
The results of this experiment are for research purposes only, and may be presented at
professional meetings or published in research literature. Your name will not be used in the
reporting of results. Only recorded data will be used; all personal information will be kept
completely confidential. A videotape of part of the experiment may be taken for purposes of
data collection only. Neither your face nor identity will ever be associated with any reporting of
these results.
In addition, because of the confidentiality of this experiment, you will be asked not to disclose
any information of what you have experienced today to anyone (including family, fellow
colleagues, and friends) for a next 30 days. Only in this way we can be assured that the
experiment will remain as realistic as possible. With your signature below you are accepting
these conditions. If for any reason you are unable to comply with any of the listed conditions,
please inform the researcher right away and you will be released of any other obligations.
Additionally, if you wish to withdraw from the experiment, you may do so at any time.
With Sincerest Thanks
388
Appendices
389
Appendices
390
Appendices
Age ____
How suitable was your previous training to the situation (equipment failure) that you
have just experienced? Please answer this question taking into account quality of
training syllabus as well as the frequency of training. (Circle the appropriate number)
Please mark the statement that is closest to your previous experience with equipment
failures:
1. I have experienced very similar or same type of equipment failure in the past.
2. I have not experienced this particular type of failure, but have experienced other
types of equipment failures previously.
3. I have never experienced equipment failure in my professional career.
Please mark the statement that is closest to your experience with ATC system:
1. I trust ATC technology more than I trust my own judgments.
2. I trust new ATC technology but I am aware of possible failures.
3. I do not trust new ATC technology, even though it is designed to make my job
easier.
391
Appendices
How would you rate your personal ability in todays training session? Personal ability
comprises different factors, not limited to: your level of fatigue, stress, confidence,
complacency, your ability to cope with emergency situation, any family or other social
group issues, etc. based on this explanation, rate your personal ability:
1. Suitable for the recovery process
2. Tolerable for the recovery process
3. Counter productive for the recovery process
How would you rate your communication for recovery today:
1. Efficient
2. Tolerable
3. Inefficient
Would you say that you had enough time to recover from the effect(s) of the equipment
failure (taking into account possible development of less than adequate separation)?
1. Yes, time was adequate. Time necessary to recover was less than available
time in the simulation.
2. No, time was not adequate. Time necessary to recover was in excess of
available time in the simulation.
Is there relevant recovery procedure for this particular failure?
1. Very familiar
2. Semi familiar
3. Not familiar at all
Would you say that HMI and operational support have been:
1. Suitable to the situation in question
2. Tolerable to the situation in question
3. Counter productive to the situation in question
Would you say that:
1. External working environment matched your internal mental model during
recovery process
2. External working environment mismatched your internal mental model at any
point of recovery
392
Appendices
How would you rate the adequacy of organisation in your ATC Centre?
1. Efficient
2. Tolerable
3. Inefficient
How would you rate traffic complexity during the recovery process (please note: only
during the recovery process and not during the entire training session):
1. High
2. Average
3. Low
How would you rate the complexity of the airspace in the used scenario? The airspace
complexity was:
1. Adequate
2. Tolerable
3. Inappropriate
How would you rate weather conditions during the recovery process?
1. Improved
2. Unchanged
3. Deteriorated
393
Appendices
Considering the entire training session how would you rate the overall task complexity:
1. Efficient
2. Tolerable
3. Inefficient
How different your todays performance is from any other day?
1. Highly representative
2. Average
3. Not representative at all
How realistic the todays task was?
1. Highly realistic
2. Moderately
3. Not realistic at all
Are you completely aware of the impact/implications of a particular failure that you have
just experienced? Do you fully understand what will happen when particular equipment
fails?
Y
N
Any comment?
Would you like to see some form of Aide-Memoire (flip chart, small laminated booklet,
HMI drop down menu) available at each CWP to assist you in recognising the effects of
a particular equipment failure and steps to be taken toward its recovery?
Y
394
Appendices
Is there any aspect of training, procedures, HMI, teamwork that could enhance your
todays recovery performance?
Thank you!!!!
395
Appendices
The researcher should replay the video recording from the moment of failure
injection and start further discussion with the subject.
1. How did you notice/detect that there was an equipment failure? What info
triggered the detection?
2. When exactly detection occurred?
3. What could have been the worst consequence if the situation was not detected?
4. Did you find diagnosis phase possible/necessary?
If yes go to question 4. If no go to question 7.
5. What was your diagnosis?
6. What you did with it (i.e. tried to confirm, or rule out alternatives)?
7. Was the recovery strategy influenced by diagnosis?
8. How did you choose the recovery strategy to apply (i.e. based on training, own
experience, colleagues experience, any other source of info)?
9. What could have made the situation worse?
10. Can you think of any fall-back actions which could mitigate this situation? Can
you suggest any changes to the procedures, phraseology; HMI design; fall-back
procedures that could improve the situation?
396
Appendices
c) Feedback form
FEEDBACK FORM
Concerning the study conducted by representatives of
Imperial College London at XXX ATC Centre 06/06/06 09/06/06
Dear Controller,
Having participated in this study we would like to ask you to provide your feedback on the
importance and value of this study. Please answer all questions as accurately as possible, since
these answers will guide us in our future endeavours. Your answers will be used only for the
assessment of the usefulness of this study.
Once again thank you very much for participating in this study!
Do you think that this experience is beneficial for your future work?
Do you feel that this experiment helped you to identify any gaps in your:
Knowledge
Training
Skills
After completing, please return this feedback form to the office of XXX.
Thank you for your time! Your cooperation is highly appreciated.
Researcher
Assistant
397
Appendices
Recovery effectiveness
According to the controller performance that you observed in this experiment (either live or on
the video recording of the experimental trial) it is necessary to use your professional experience
and assess the effectiveness of the controllers recovery.
Recovery is considered successful if the system returns to the normal or intermediate (but still
stable) state. In the short term (as simulated in this experiment), the situation should be stable
and control of airspace should be considered safe, but not necessarily efficient.
Please notice that the anchor points of each scale range from Firmly Disagree to Firmly
Agree. Place a mark in one of the five boxes along each line, as shown in following example.
Example
In general, I am professionally more efficient in the mornings than evenings.
x
Firmly
Disagree
Partly
Disagree
Neutral
Partly
Agree
Firmly
Agree
Firmly
Disagree
Partly
Disagree
Neutral
Partly
Agree
Firmly
Agree
2. In this traffic scenario, it was possible to implement more than one recovery strategy.
Firmly
Disagree
Partly
Disagree
Neutral
398
Partly
Agree
Firmly
Agree
Appendices
If answered partly agree or firmly agree, your answer referrers that you thought of alternative
recovery strategy(s). Please describe briefly this/these alternative(s).
3. If you were in the place of subject-controller, would you implement different recovery strategy
than he did?
Firmly
Disagree
Partly
Disagree
Neutral
Partly
Agree
Firmly
Agree
If answered partly agree or partly disagree, please specify your reasons to implement different
recovery strategy and which recovery strategy that would be. In addition, please specify any
particular/difficult issues regarding traffic situation during the recovery process:
399
Appendices
How would you rate traffic complexity during the recovery process (please note: only during the
recovery process and not during the entire training session).
1. High
2. Average
3. Low
4. Adequate
5. Tolerable
6. Inappropriate
How would you rate weather conditions during the recovery process?
4. Improved
5. Unchanged
6. Deteriorated
How realistic the todays task was?
4. Highly realistic
5. Moderately
6. Not realistic at all
Thank you!!!!
400
Appendices
______________
______________
______________
401
Chapter 13
Appendices
(1)
(2)
(3)
(4)
ID
RIF name
Descriptor
Probability
(p)
Internal factors
Training for
recovery from ATC
equipment failure
Previous experience
with equipment
failures
Personal factors
Communication for
recovery within
team/ATC Centre
Complexity of failure
type
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
Experienced with a
particular type of
failure or
Experienced with
any other type of
ATC equipment
failure
No experience with
ATC equipment
failures
Objective attitude
toward the system
Positive experience
with the system
(excessive trust) or
Negative
experience with the
system (undertrust)
Suitable for the
recovery process
Tolerable for the
recovery process
Counter productive
for the recovery
process
Time course of
failure development
Number of
workstations/sectors
affected
Level
Designator
(R)
(8)
Probability
of overall
situation
occurring
(p*R)
0.73
0.23
Non
significant
0.03
Least
favourable
-1
-0.03
0.83
Most
favourable
0.83
0.17
Non
significant
0.93
Non
significant
0.07
Least
favourable
-1
-0.07
0.83
-1
-0.03
0.27
-1
-0.07
-1
-1
0.83
0.13
0.03
Tolerable
0.67
Inefficient
0.07
Sudden failure
(7)
Most
favourable
0.27
Persistent or latent
failure
Gradual
degradation of
system
One
workstation/one
sector or All
workstations in one
sector
(6)
0.73
Efficient
Single system
affected
Multiple systems
affected
(5)
Expected
effect of
controller
recovery
performance
0
1
Most
favourable
Non
significant
Least
favourable
Most
favourable
Non
significant
Least
favourable
Non
significant
Least
favourable
Improve
Non
significant
Least
favourable
-1
Non
significant
Appendices
10
11
12
13
16
17
18
19
20
Time necessary to
recover
Existence of
recovery procedure
Duration of failure
Adequacy of HMI
and operational
support
Ambiguity of
information in the
working
environment
Adequacy of
organisation
Traffic complexity
Airspace
characteristics
Weather conditions
during the recovery
process
Conflicting issues in
the situation (task
complexity)
Several
workstations/couple
of sectors or All
workstations/all
sectors
Adequate - less
than available time
Inadequate - in
excess of available
time
Suitable to the
situation in
question
Tolerable to the
situation in
question
Least
favourable
-1
-1
0.86
Most
favourable
0.86
0.14
Least
favourable
-1
-0.14
Most
favourable
Non
significant
-1
-1
Inappropriate
Moderate period of
time or Substantial
period of time
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
External working
environment
matches the
controller's internal
mental model
External working
environment
mismatches the
controller's internal
mental model
Least
favourable
-1
0.5
Most
favourable
0.5
0.39
Non
significant
0.11
Least
favourable
-1
-0.11
Most
favourable
Least
favourable
-1
0.4
-1
-0.1
Efficient
0.4
Tolerable
0.5
Inefficient
0.1
Average traffic
complexity
Extremely high or
extremely low
traffic complexity
Adequate (e.g.
enroute higher
levels)
Least
favourable
Non
significant
0.35
Most
favourable
Non
significant
Least
favourable
Non
significant
0.65
Least
favourable
-1
-0.65
0.8
Most
favourable
0.8
Tolerable
0.1
Non
significant
Inappropriate (e.g.
enroute lower
levels or terminal)
0.1
Least
favourable
-1
-0.1
Improved
0.83
Deteriorated
0.17
-1
-0.17
-1
-0.7
Average complexity
of the situation
Conflicting, multiple
tasks or Extremely
low complexity of
the situation (may
lead to monotony)
0.3
0. 7
403
Non
significant
Least
favourable
Non
significant
Least
favourable
Appendices
800
700
Frequency
600
500
400
300
200
100
-0
.0
88
-0
.0
78
-0
.0
6
-0 8
.0
58
-0
.0
4
-0 8
.0
38
-0
.0
2
-0 8
.0
18
-0
.0
08
0.
00
2
0.
01
2
0.
02
2
0.
03
2
0.
04
2
0.
05
2
0.
06
2
0.
07
2
0.
08
2
0.
09
2
0.
10
2
0.
11
2
20.022
20.042
2 2
2 2
1
2
f (x) = A e
+ A e
= 141.4e
+ 632.8 e
1
2
404
Appendices
405