PHD Thesis - HF in ATC

FRAMEWORK FOR THE ANALYSIS OF CONTROLLER RECOVERY
FROM EQUIPMENT FAILURES IN AIR TRAFFIC CONTROL
Branka Subotic (MSc BSc)
April 2007
A thesis submitted for as fulfilment of the requirements for the degree of Doctor of
Philosophy of the University of London and for the
Diploma of Membership of Imperial College London
Centre for Transport Studies
Department of Civil and Environmental Engineering
Imperial College London, United Kingdom
Declaration
At various stages during this PhD, I was involved in collaborative efforts with both
academic and industrial colleagues. In certain cases, the outputs of these collaborations
are included in this thesis to better explain and support the research presented. In
particular, during the period 2004 to 2005, colleagues from the Air Traffic Management
(ATM) Group at the Centre for Transport Studies, Imperial College London, assisted in the
questionnaire-based survey of air traffic controllers. This mainly involved the distribution of
questionnaires and collection of the responses.
Furthermore, a key element of the research presented in this thesis is the experiment
conducted at a facility owned and operated by a Civil Aviation Authority (CAA). The
experiment was facilitated by the assistance of various Air Traffic Control (ATC) Centre
staff including ATM specialists, ATC controllers, pseudo-pilots, engineers, and technicians.
Finally, EUROCONTROL staff provided a valuable contribution at various stages of this
research in terms of access to relevant publications, professional networks, and simulation
trials.
I hereby declare that besides the collaborations referred to above, I have personally
carried out the work described in this thesis:
..
Branka Subotic
..
Dr. Washington Yotto Ochieng
ii
Abstract
An Air Traffic Control (ATC) system represents a set of components that act together to
achieve a safe and efficient flow of traffic in any given airspace. The elements of this
system are human operators, equipment, and procedures, along with all the interactions
between them. Failure of equipment, as one component of an ATC system, and its
interaction with human operators (i.e. air traffic controllers) is the main focus of the
research presented in this thesis. Thus, the thesis focuses on the human recovery process
triggered by failure of equipment that support air traffic controllers in the provision of air
traffic services in a dedicated airspace. A detailed understanding of the controller recovery
process has the potential to significantly contribute to safety and operational efficiency in
the current and future ATC environment. Currently, there is a very limited understanding of
the factors that influence the recovery process, particularly with respect to equipment
failures in ATC. This thesis builds on existing relevant research in other industries and
uses targeted experiments and mathematical modelling to develop a functional
relationship between recovery and its influencing factors.
The research presented in this thesis addresses on two areas, namely equipment failures
in ATC and controller recovery. The first investigates the characteristics of the ATC
equipment failures from past research and derives the associated target level of safety.
Linking the target level of safety with available operational failure reports establishes a
means to validate the realism and operational significance of the equipment failure
characteristics. A subset of these characteristics relevant to the ATC operations is further
used to develop a novel qualitative equipment failure impact assessment tool. This tool
enables the identification of equipment failures that are most severe to ATC operations
and thus may be most challenging to controller performance.
iii
Having identified the relevant equipment failure types and their characteristics, the thesis
carries out a critical review of the associated issues regarding the process of controller
recovery. A critical element of this is the review of past human reliability research and its
relationship to controller recovery from equipment failures in ATC. The findings from this
are augmented by questionnaire survey results based on responses of 134 air traffic
controllers from 34 countries. Both the past research and the questionnaire survey results
are used to highlight the importance of the context in which controller recovery
performance takes place and to define the recovery context through a set of 20 candidate
contextual factors or Recovery Influencing Factors (RIFs).
The thesis then uses the candidate RIFs to develop a novel approach for the quantitative
assessment of the recovery context through the concept of recovery context indicator. This
approach and its operational benefits are further validated by an experiment conducted in
a training facility of an ATC Centre with the participation of 30 operational air traffic
controllers. In addition to the verification of the generic methodology for the assessment of
the recovery context, the experimental data are used to analyse controller recovery
performance and investigate the outcome of the recovery process. The findings obtained
from the experimental investigation are in line with those obtained from past research and
the ATC operational environment.
iv
Acknowledgements
Having started my research initially at the EUROCONTROL Experimental Centre (EEC) in

Bretigny sur Orge and then at Imperial College London, it is understandable that naming
all those people who have contributed to this work is quite a hard task. However, I will try
anyway and if some names are not listed, my gratitude is not less than for those listed
below.
For help with the funding of my studies, I would like to thank the following organisations:
EUROCONTROL Experimental Centre (EEC) in Bretigny sur Orge, France for the
award of a graduate internship and a further three-year research studentship;
Universities UK for the Overseas Research Scheme (ORS) award for three
consecutive years; and
the Centre for Transport Studies, Department of Civil and Environmental
Engineering, Imperial College London for the contribution to my tuition fee and a
three-year research bursary.
This PhD research would not have been possible without Christian Push and Dirk
Schaefer who invited me initially to join the EUROCONTROL Human Factors group and to
start developing a research project satisfying both the needs of the EEC as well as my
own interests. Once started, this collaboration proved to be highly supportive in both
technical and financial terms. As a EUROCONTROL PhD student I had a privilege of
unlimited access to many aviation experts working in house: at the EEC, Headquarters
(Belgium), and the Maastricht Upper Area Control (UAC) Centre (Netherlands). Among
these were Nigel Makings, Catherine Gandolfi, Eric Perrin, Deirdere Bonini, Rachael
Gordon, Andrew Harvey, and the entire Gate-to-Gate (G2G) team and controllers involved
in simulation A and B, especially Diarmuid Houlihan Motto. I thank them all for the fruitful
collaboration. My special gratitude goes to Barry Kirwan and Oliver Straeter whose
technical assistance and unlimited support was crucial to embarking upon the field of
human reliability, completely unknown to me at the beginning of this research. Their
assistance and interest in my research opened many doors and assured the highest
quality of information and professional contacts.
At Imperial College there are many colleagues and research students that offered their
help at various stages and aspects of my work. Among them are Jackie Sime, William
Knottenbelt, Dimitri Panagiotakopoulos, Marie-Dominique Dupuy, Umar Bhatti, Victoria
Williams, and Wolfgang Shuster. However, my biggest gratitude goes to Arnab Majumdar
and to my supervisor, Washington Y. Ochieng. They had a critical role in the support,
supervision, and achievement of excellence in my research. Thanks to their
understanding, I attended various technical meetings, seminars, conferences, courses,
and simulation trials. These proved to be a significant direct and indirect contribution to the
quality of the research presented in this thesis.
One of the critical parts of the research presented in this thesis would not be feasible
without the technical support of the Irish Aviation Authority staff, especially Nick Lowth,
Bernard Mackessy, and Garrett MacNamara. However, my special gratitude goes to Alan
Byrne for making the impossible truly possible and allowing me to complete successfully a
key part of this research and make it complete.
There are many other people that have helped in various ways. I would like to thank Yvette
Dalle-Mule, Veronique Begault, and Sonja Straussberger from EUROCONTROL EEC.
Furthermore, I would like to thank Rajkumar Pant from the Indian Institute of Technology,
Isa Alkalaj and Marek Bekier from Skyguide, Martin Richards and Vic Burgess from UK
NATS, Christopher Adams from Maastricht UAC, Bob Phillips from CASA Australia, Peter
Nalder from New Zealand Civil Aviation Authority (CAA), Jos Kuijper and Randal de Garis
from EUROCONTROL, Sarah Doherty and Joji Waites from the UK CAA, and Keshava
Sharma from the Airports Authority of India.
I want to thank my friend Tamara Pejovic for all the support that she gave me during the
years I have been working on this thesis. Last but not least, I want to express my deepest
gratitude to my brother and my mother who were always the core support in all the
journeys that I have embarked upon. Hence, I am dedicating this thesis to them.
vi
Table of Contents
DECLARATION
ABSTRACT
ACKNOWLEDGEMENTS
TABLE OF CONTENTS
LIST OF FIGURES
LIST OF TABLES
LIST OF ABBREVIATION
ii
iii
v
vii
xiv
xvii
xix
1 INTRODUCTION
1.1 Background to the problem
1.2 Research objectives
1.3 Outline of the thesis
1
1
4
5
2 FUNDAMENTALS OF AIR TRAFFIC MANAGEMENT AND CONTROL

2.1 Air Traffic Management
2.2 Air Traffic Control
2.2.1 Area Control service
2.2.2 Approach Control service
2.2.3 Aerodrome control service
2.3 Overall Air Traffic Control system architecture
2.3.1 Air Traffic Control functionalities
2.3.1.1 Communication function
2.3.1.2 Navigation function
2.3.1.2.1 Approach and landing navigation
2.3.1.2.2 Area navigation
2.3.1.2.3 Systems for control and monitoring of ground-based airport
facilities
2.3.1.3 Surveillance function
2.3.1.3.1 Radar systems
2.3.1.3.2 Radar and auxiliary display
2.3.1.3.3 Terminal and ground surveillance
2.3.1.4 Data processing and distribution function
2.3.1.5 Supporting function
2.3.1.6 Safety Nets
2.3.1.7 Power supply
2.3.1.8 Pointing and input devices
2.3.1.9 System control and monitoring function
2.4 Characteristics of the generic Air Traffic Control Centre
2.5 The future of Air Traffic Control
8
8
10
11
12
12
13
15
15
18
19
20
22
vii
22
23
24
24
25
28
29
30
31
31
32
34
2.5.1 Challenges of automation

2.5.2 Human-centred vs. technology-centred automation
2.5.3 The future of air navigation service
2.5.4 Impact of future ATM/ATC on controller recovery from equipment failures
2.6 Summary
34
36
37
38
39
3 PRELIMINARY ASSESSMENT OF EQUIPMENT FAILURES IN AIR TRAFFIC

CONTROL
3.1 Definition of equipment failure
3.2 Definition of a hazard
3.3 Supporting data: operational failure reports
3.3.1 Reporting and data collection
3.3.2 Data pre-processing problems
3.3.3 Available operational failure reports
3.4 Methodology to assess the relevance of supporting data
3.4.1 The accident to incident ratio
3.4.2 Units of measurement
3.4.3 The acceptable risk or target level of safety (TLS)
3.4.3.1 Existing standards
3.4.3.1.1 Joint Aviation Authority
3.4.3.1.2 UK Civil Aviation Authority
3.4.3.1.3 International Civil Aviation Organisation
3.4.3.1.4 Summary of the various TLS analyses
3.4.4 Target level of safety and Air Traffic Control risk budgeting
3.4.5 Target level of safety and Air Traffic Control equipment risk budgeting
3.5 Preliminary analysis and validation of operational failure reports
3.6 Summary
41
42
44
45
46
47
49
51
51
53
55
55
56
58
58
60
62
63
65
67
4 EQUIPMENT FAILURES AND TECHNICAL DEFENCES IN AIR TRAFFIC CONTROL

4.1 Equipment failure characteristics
4.1.1 ATC functionality affected
4.1.2 Complexity of failure type
4.1.3 Time course of failure development
4.1.4 Duration of failure
4.1.5 Potential causes of equipment failures
4.2 Consequences of equipment failure
4.2.1 Impact on air traffic controller
4.2.2 Impact on operations room
4.2.3 Impact on ATC operations
4.2.4 Impact on ATM operations
4.3 Definition of technical defences (technical recovery)
4.3.1 Defences for recovering from failure (safety devices)
4.3.2 Defences for transmitting information regarding the failure (warning devices)
4.4 Analysis of operational failure reports
4.4.1 Data analysis methodology
4.4.2 Rate of equipment failures
4.4.3 Type of ATC functionality and equipment affected
4.4.5 Severity of equipment failures
4.4.6 Duration of equipment failures
4.4.7 Additional statistical tests
69
69
70
71
71
72
72
73
73
73
74
79
80
82
83
85
85
89
91
95
96
98
100
viii
4.5 Qualitative equipment failure impact assessment tool

4.6 Summary
101
107
5 AIR TRAFFIC CONTROLLER RECOVERY

5.1 Human recovery in air traffic control
5.1.1 Recovery by air traffic controllers
5.1.2 Recovery by system control and monitoring engineers
5.2 Phases of the controller recovery process
5.2.1 Detection
5.2.2 Diagnosis
5.2.3 Correction
5.3 Outcome of the recovery process
5.4 Models of human recovery
5.4.1 Model by Kanse
5.4.2 RAFT Tool
5.4.3 Model by Wickens et al.
5.5 Procedures for handling ATC equipment failures
5.5.1 Existing regulations
5.5.1.1 International regulation
5.5.1.2 European and national regulation
5.5.1.3 Air navigational service provider regulation
5.5.2 Main principles on recovery procedures in ATC
5.6 Training for handling ATC equipment failures
5.6.1.2.1 UK Civil Aviation Authority regulation
5.6.2 Areas of concern related to recovery training
5.7 Definition of controller recovery performance in this thesis
5.7.1 Recovery context
5.7.2 Recovery effectiveness
5.7.3 Recovery duration
5.8 Summary
109
109
110
110
111
113
116
117
119
121
122
123
124
126
127
127
128
128
130
131
131
131
132
132
133
133
135
135
136
136
137
6 QUESTIONNAIRE SURVEY
6.1 Objectives of the questionnaire survey
6.2 sampling
6.3 Survey methodology
6.4 Design of the questionnaire
6.5 Pilot survey
6.6 Full survey
6.6.1 Face-to-face interviews
6.6.2 Self-completion survey
6.6.3 Potential sources of errors
6.7 Methodology for the questionnaire survey data analysis
6.7.1 Data pre-processing for analysis
6.7.2 Characteristics of the sample
6.7.2.1 Sampling per ATC Centre
6.7.2.2 Sampling of air traffic controllers
6.7.3 High-level analyses
139
140
141
143
144
146
147
147
147
148
149
150
151
154
154
155
ix
6.7.3.1 Experience with equipment failures (Q1)

6.7.3.2 Factors that influence the controller recovery performance (Q2)
6.7.3.3 The most unreliable ATC systems/tools (Q3)
6.7.3.4 Organised exchange of information on equipment failures (Q4)
6.7.3.5 Status and quality of recovery procedures (Q5)
6.7.3.5.1 Other findings regarding the recovery procedures
6.7.3.6 Status and quality of training for recovery (Q6)
6.7.3.6.1 Other findings on training for recovery
6.7.3.7 Other findings on recovery performance
6.7.4 Interaction analyses
6.8 Summary
156
156
158
163
164
167
168
170
171
171
175
7 METHODOLOGY FOR A SELECTION OF RELEVANT AIR TRAFFIC CONTROLLER

RECOVERY INFLUENCING FACTORS
7.1 Relevance of the recovery context
7.1.1 Example of the recovery context
7.2 Methodology to extract the candidate set of contextual factors
7.2.1 Human Reliability Assessment techniques
7.2.1.1 Human Error in ATM (HERA)
7.2.1.2 Technique for the Retrospective and Predictive Analysis of Cognitive
Errors in ATC (TRACEr)
7.2.1.3 Recovery from Automation Failure (RAFT) Tool
7.2.1.4 Recovery from failures: understanding the positive role of human
operators during incidents
7.2.1.5 Computerised Operator Reliability and Error Database (CORE-DATA)
7.2.1.6 Technique for Human Error Rate Prediction (THERP)
7.2.1.7 Human Error Assessment and Reduction Technique (HEART)
7.2.1.8 The Contextual Control Model (COCOM)
7.2.1.9 Cognitive Reliability and Error Analysis Method (CREAM)
7.2.1.10 Human Reliability Management System (HRMS)
7.2.1.11 A Technique for Human Event Analysis (ATHEANA)
7.2.1.12 Connectionism Assessment of Human Reliability (CAHR)
7.2.1.13 Nuclear Action Reliability Assessment (NARA)
7.2.1.14 Human Performance DataBase (HPDB)
7.2.1.15 Summary of the findings
7.2.2 Augmentation with equipment-failure related factors
7.2.3 Augmentation with dynamic situational factors
7.2.4 Further subdivision of the identified RIFs
7.3 Definition of qualitative descriptors
7.4 Summary
178
187
188
190
191
192
193
194
195
196
197
198
200
200
201
202
204
8 QUANTITATIVE ASSESSMENT OF THE RECOVERY CONTEXT

8.1 Lessons leant from past research
8.1.1 Application of the CREAM technique
8.1.2 Connectionism Assessment of Human Reliability (CAHR)
8.2 Framework for the methodology for a quantitative assessment of recovery context
8.3 Probabilistic assessment of RIFs (Step 2)
8.3.1 Sources of information
8.3.1.1 Operational failure reports
8.3.1.2 Questionnaire survey
8.3.1.3 Input by ATM Specialists
206
206
207
208
209
211
212
212
213
213
178
180
181
183
183
184
185
186
8.3.1.4 Past literature

8.3.1.5 Aggregation of data
8.3.2 Summary
8.4 Interactions between Recovery Influencing Factors (Step 3)
8.4.1 Identification of RIF interactions
8.4.2 Validation of RIF interactions
8.4.2.1 CREAM
8.4.2.2 CAHR
8.4.2.3 Validation by ATM specialists
8.4.2.4 Validation summary
8.4.3 Quantification of RIFs interactions
8.5 Methodology for the determination of the cut-off points (Step 4)
8.6 Specific effects of RIFs on controller recovery performance (Step 5)
8.7 Calculation of the recovery context indicator (Step 6)
8.7.1 Re-calculation of RIF probabilities
8.7.2 Distribution of the recovery context indicator
8.7.3 Sensitivity analysis
8.7.4 Optimal solutions
8.8 Summary
216
216
217
218
218
221
221
221
222
223
223
227
231
232
232
234
236
237
238
9 EXPERIMENTAL INVESTIGATION OF THE AIR TRAFFIC CONTROLLER

RECOVERY PERFORMANCE
9.1 High-level design of the experimental process
9.2 Rationale for the experiment
9.3 Assessment of the available resources
9.4 Planning for the experiment
9.5 Design of the experiment
9.6 Selection of the equipment failure to be simulated
9.7 Pilot study: lessons learnt
9.7.1 Summary of the findings from the pilot study
9.8 Experimental set up
9.8.1 Airspace characteristics
9.8.2 Traffic characteristics
9.8.3 Equipment failure characteristics
9.9 Experimental variables
9.9.1 Independent Variables
9.9.1.1 Recovery Influencing Factors (RIFs)
9.9.1.2 Required recovery steps
9.9.2 Dependent Variables
9.9.2.1 Recovery effectiveness
9.9.2.2 Recovery duration
9.9.3 Extraneous Variables
9.10 Potential limitations
9.11 Summary
240
241
242
242
243
244
246
249
252
253
256
257
257
259
260
260
263
264
264
266
267
268
268
10 ANALYSIS OF EXPERIMENTAL RESULTS

10.1 Overall framework
10.2 Participants
10.2.1 Age and operational experience
10.2.2 Ratings
10.3 Assessment of controller recovery performance
270
270
271
272
272
274
xi

274
10.3.1.1 Assessment of relevant RIFs
274
10.3.1.2 Probabilities of each RIF and its corresponding level
275
10.3.1.3 Interactions between RIFs
276
10.3.1.4 Recovery context indicator (Ic)
276
10.3.1.5 Optimal solutions
280
10.3.1.5.1 Impact of enhancing recovery procedure on recovery 281
context
10.3.2 Required recovery steps
283
285
286
10.3.5 Outcome of the recovery process
289
10.3.6 Interactions
291
10.3.7 Other findings
292
10.3.7.1 The recovery phases
292
10.3.7.1.1 Detection
292
10.3.7.1.2 Diagnosis
293
293
10.3.7.1.3 Correction
10.3.7.2 Observed behaviour and attitude
295
10.3.7.3 Additional findings
296
10.4 Summary
299
11 CONCLUSIONS
11.1 Revisiting the research objectives
11.2 Conclusions
11.2.1 Literature review
11.2.2 Equipment failure types and their characteristics
11.2.3 Controller recovery performance, recovery context, and influencing factors
11.2.4 Framework for the analysis of controller recovery
11.3 Future work
11.4 Publications relating to this work
11.4.1 Publication format: journal accepted subject to revision
11.4.2 Publication format: journal published
11.4.3 Publication format: conference proceedings - published
301
301
301
301
302
303
305
306
307
308
308
308
12 LIST OF REFERENCES
309
APPENDICES
Appendix I
Appendix II
Appendix III
Appendix IV
Appendix V
Appendix VI
Appendix VII
Appendix VIII
Appendix IX
Appendix X
Appendix XI
323
The cost of delays induced by equipment failures
324
326
Interviews with ATM staff
Checklist for the Equipment Failure Scenarios in a specific European 329
ATC Centre - An Aide-Memoire framework
The questionnaire design
341
Example of one questionnaire response
348
Results extracted from question 5 of the questionnaire survey
354
359
Overview of contextual factors
361
Probabilities for 20 Recovery Influencing Factors (RIFs)
Questions for the ATM Specialist
375
Overview of RIFs, their corresponding levels, and designated 378
probabilities
Validation of the RIFs interaction matrix
381
xii
Appendix XII
Appendix XIII
Appendix XIV
Appendix XV
Distribution of 20 Recovery Influencing Factors (RIFs)

Experimental material
Overview of RIFs, their corresponding levels, and probabilities
determined in the experimental investigation
Distribution of the recovery context indicator captured in the
experiment
xiii
383
385
402
404
List of Figures
Figure 1-1
Figure 2-1
Figure 2-2
Figure 2-3
Figure 2-4
Figure 2-5
Figure 2-6
Figure 2-7
Figure 2-8
Figure 2-9
Figure 3-1
Figure 3-2
Figure 3-3
Figure 3-4
Figure 3-5
Figure 4-1
Figure 4-2
Figure 4-3
Figure 4-4
Figure 4-5
Figure 4-6
Figure 4-7
Figure 4-8
Figure 4-9
Figure 4-10
Figure 4-11
Figure 4-12
Figure 4-13
Figure 5-1
Figure 5-2
Figure 5-3
Figure 5-4
Overview of the thesis

Air transport system (from Subotic et al., 2005)
Flight profile (adapter from ICAO, 2001b)
ATM and ATC system components (adapted from ICAO, 2001a)
Communication function
Navigational function
Surveillance function
Data processing and distribution function
Supporting function
System monitoring and control function
Phases of an equipment failure occurrence
Different definitions
Reporting system
Bathtub model of reliability for electronic components (Leveson,
1995)
Aviation TLS and risk budgeting
Safety through design (adapted from Christensen and Manuele,
1999)
Technical and human recovery
Operational failure reports analyses
Total number of equipment failures per flight hours flown in each
year for countries A, B, and C
Total number of equipment failures per flight hours flown in each
year for country D (year 2000 incomplete)
Most affected ATC functionality (Country A)
Most affected ATC functionality (Country B)
Most affected ATC functionality (Country C)
Most affected ATC functionality (Country D)
Distribution of equipment failures according to their severity
Distribution of major equipment failures according to ATC
functionality
Distribution of the failure duration according to four distinct
categories
Qualitative equipment failure impact assessment tool
Analysis of outcome phase (adapted from EUROCONTROL, 2004e)
Recovery process phase model (Kanse, 2004)
The Recovery from Automation Failure Tool (RAFT) Framework
(EUROCONTROL, 2004e)
Model of failure recovery in air traffic control. Where two nodes are
xiv
7
9
10
14
16
19
23
26
29
31
41
43
46
50
64
81
82
87
90
90
91
92
92
93
96
97
99
105
120
123
124
125
Figure 6-1
Figure 6-2
Figure 6-3
Figure 6-4
Figure 6-5
Figure 6-6
Figure 6-7
Figure 6-8
Figure 6-9
Figure 6-10
Figure 6-11
Figure 7-1
Figure 8-1
Figure 8-2
Figure 8-3
Figure 8-4
Figure 8-5
Figure 8-6
Figure 8-7
Figure 8-8
Figure 9-1
Figure 9-2
Figure 9-3
Figure 9-4
Figure 10-1
Figure 10-2
Figure 10-3
Figure 10-4
Figure 10-5
Figure 10-6
Figure 10-7
Figure 10-8
Figure 10-9
Figure 10-10
Figure 10-11
connected by an arrow, signs (+, -, 0) indicate the direction of effect

on the variable depicted in the right node, caused by an increase in
the variable depicted in the left node (Wickens et al., 1998)
The flow diagram of organising a survey
Distribution of world air traffic per region for the year 2003 and 2023
(adapted from Airbus, 2004)
One-page example of the questionnaire
The flow chart of questionnaire survey analyses
Distribution of questionnaire responses per region
Distribution of operational experience
Distribution of air traffic controllers ratings
Controllers reliance on written procedures throughout the recovery
process
Controllers reliance on situation-specific problem solving throughout
the recovery process
Controllers reliance on past experience throughout the recovery
process
Distribution of affected ATC functionalities as reported in the
questionnaire survey
Methodology to extract a candidate set of RIFs
Framework for the quantitative assessment of the recovery context
Distribution of RIF5 levels amongst identified recovery contexts
without interactions
Distribution of RIF5 levels amongst identified recovery contexts with
interactions
interactions
interactions
Distribution fitting for the three cut-off points on the example of RIF5
Level 1
Cubic polynomial function f(x) fitted for the RIF5 to determine its
minimum
Distribution of the recovery context indicator
The flow diagram of experimental investigation
Timeline of the experiment
Room setup
The visual representation of equipment failure on CWP: a) before the
failure, b) after the failure
Framework for the analysis of experimental results
Distribution of operational experience
Distribution of controllers ratings
Distribution of the recovery context indicator in the experiment
Distribution of the recovery context indicator in the experiment with
an increased value of the coefficient of interaction
Distribution of the recovery context indicator of 30 controllers
Recovery steps performed by each participant
Distribution of required recovery steps (S1 to S17)
Distribution of recovery effectiveness per category
Distribution of recovery duration
Distribution of the recovery outcome
xv
140
142
146
150
153
155
155
157
157
158
159
182
210
226
226
227
227
229
230
235
241
254
255
258
271
272
273
277
279
280
283
284
286
287
290
Figure 10-12
Recovery phases, their corresponding influencing factors, and

required recovery steps
xvi
295
List of Tables
Table 3-1
Table 3-2
Table 3-3
Table 4-1
Table 4-2
Table 4-3
Table 4-4
Table 4-5
Table 4-6
Table 4-7
Table 4-8
Table 4 9
Table 4-10
Table 4-11
Table 4-12
Table 4-13
Table 4-14
Table 4-15
Table 4-16
Table 4-17
Table 5-1
Table 5-2
Table 6-1
Table 6-2
Table 6-3
Table 6-4
Table 6-5
Table 7-1
Table 7-2
Table 7-3
Summary of available data, number of reports, and equipment failure

incidents per country
Summary of various analyses on aviation TLS
Analysis of operational failure reports and results
Examples of equipment failures related to different ATC system
functionalities (as defined in Chapter 2)
UK NATS severity rating (from NATS, 2002)
Country Cs severity rating as defined by its CAA
Country D severity rating as defined by the particular ATC Centre
Severity rating defined in this research and mapped with available
sources
Most affected ATC equipment (Country A)
Most affected ATC equipment (Country B)
Most affected ATC equipment (Country C)
Most affected ATC equipment (Country D)
Summary of five ATC equipment types most affected by failures
Percentage of the multiple failure occurrences reported in the
available datasets
Summary of five most affected equipment types from four datasets
Distribution of major failures lasting up to 15 minutes per ATC
equipment affected
Statistical tests and results obtained
Main findings regarding interaction between ATC functionality and
severity
Review of equipment failure characteristics with regard to their
impact on ATC operations
Detailed overview of the primary and the secondary group of ATC
functionalities
Phases of the recovery process identified in past research
Summary of relevant models of the human recovery process
Summary of the questionnaire survey sample
Mapping between most unreliable ATC functionalities and existing
recovery procedures for sampled worldwide countries
Existence of recovery procedures, recovery training, and recurrent
training as reported in the questionnaire survey
Interaction matrix
Statistical tests and results obtained
Factors influencing recovery from failures (from Kanse and van der
Schaaf, 2000)
Factors influencing human actions in THERP (cited in Straeter,
2000)
Review of Human Reliability Assessment (HRA) techniques and
xvii
49
61
66
70
75
76
76
77
91
92
93
94
94
95
98
99
100
101
101
103
112
126
151
160
165
172
173
186
189
198
Table 7-4
Table 7-5
Table 8-1
Table 8-2
Table 8-3
Table 8-4
Table 8-5
Table 8-6
Table 8-7
Table 8-8
Table 8-9
Table 8-10
Table 8-11
Table 8-12
Table 8-13
Table 9-1
Table 9-2
Table 9-3
Table 9-4
Table 9-5
Table 9-6
Table 9-7
Table 9-8
Table 9-9
Table 9-10
Table 10-1
Table 10-2
Table 10-3
Table 10-4
Table 10-5
Table 10-6
Table 10-7
Table 10-8
Table 10-9
Table 10-10
Table 10-11
Table 10-12
relevant findings
Recovery Influencing Factors
Relevant recovery influencing factors and their corresponding
qualitative descriptors
Overview of CREAM and CAHR differences
Distribution of probabilistic RIF ratings per source
ATM specialists involved in the assessment of RIFs
Overview of the sources of information used to determine RIF
probabilities
Example of a potential recovery context represented as a 20-digit
array
Interaction matrix: (1) validation by CREAM, (2) validation by CAHR,
(3) validation by ATM specialists; and (x) not validated interactions
Mapping between RIFs and CAHR contextual factors
Recovery context (as presented in Table 8-5) after the incorporation
of RIF interactions
Descriptive statistics for the three cut-off points on the example of
RIF5 Level 1
Local minimums of polynomial functions
Cut-off points between the levels for all RIFs
Probabilities for the RIF5 and each of its levels (see Appendix VII)
Sensitivity analysis
Training, pilot study, and experiment sessions
Overview of the potential equipment failures to be simulated and
their inclusion in the pilot study
Equipment failures used in the pilot study
The mapping between exercise characteristics and the controllers
observations
Equipment failure in the experimental study
Availability of functions in the reduced flight data processing mode
Overview of independent and dependent variables
Overview of independent and extraneous variables
Overview and description of required recovery steps
Recovery process and its three main tasks
Characteristics of a sample of controllers participating in experiment
Verification of RIFs probabilities from a generic approach (Chapter
8) and the experiment
Summary of RIFs defined through a single corresponding level
Verification of the distribution of the recovery context indicator
obtained from a generic approach (Chapter 8) and the experiment
A review of RIFs with the potential for recovery enhancement
A review of the proposed recovery solutions
Percentage of performed recovery steps in three experimental
sessions
Comparison of recovery durations between three experimental
sessions
Statistical tests and results
The outcome of the recovery process matrix (S stands for
successful, T for tolerable, and U for unsuccessful recovery)
Statistical tests and results
Summary of additional findings
xviii
201
203
208
212
214
217
218
220
222
225
229
230
230
232
237
244
247
249
257
258
259
259
261
263
265
273
275
277
278
281
282
285
288
289
290
291
299
List of Abbreviations
ACAS
ACC
ADREP
ADS
ADS-B
ADS-C
AFTN
A/G
AGDP
AGL
AIAA
AIS
AMAN
ANSP
APP
APR
APW
ARO
ARTCC
ASAS
ASM
ASMT
ASMT
ASTERIX
ATC
ATCT
ATFM
ATHEANA
ATIS
ATM
ATS
AWOP
BBN
BEST
BEVOR
CAA
CAHR
Airborne Collision Avoidance System

Area Control Centre
Accident/Incident Reporting
Automatic Dependent Surveillance
Automatic Dependence Surveillance Broadcast
Automatic Dependence Surveillance Contract
Aeronautical Fixed Telecommunication Network
Air-Ground communication
Air Ground Data Processor
Aeronautical Ground Lighting
American Institute of Aeronautics and Astronautics
Aeronautical Information Service
Arrival Manager
Air Navigation Service Provider
Approach Control Office
Automatic Position Reporting
Area Proximity Warning
Air traffic services Reporting Office
Air Route Traffic Control Centre
Airborne Surveillance and Separation Assurance
Airspace Management
ATM Safety Monitoring Tool
Automatic Safety Monitoring Tool
All Purpose STructured Eurocontrol Radar Information
Exchange
Air Traffic Control
Air Traffic Control Tower
Air Traffic Flow Management
A Technique for Human Event Analysis
Aeronautical Terminal Information Service
Air Traffic Management
Air Traffic Service
All-Weather Operations Panel
Bayesian Belief Network
Beginning to End Skills Trainer
German special occurrences database
Civil Aviation Authority
Connectionism Assessment of Human Reliability
xix
CATIS
CC
CLAM
CEATS
CFMU
CMS
CNS
COCOM
CORE-DATA
CPC
CPDLC
CPM
CRDS
CREAM
CS
CWP
DARC
DMAN
DME
EASA
ECAC
ECSS
EGNOS
EOC
EOO
EPC
ESA
ESSAR
ET
EU
EUROCONTROL
FAA
FANS
FDPD
FDPS
FIR
FIS
FL
FMEA
FMECA
FMS
FPP
FPS
FT
G2G
G/G
GLONAS
GNSS
GPS
HEART
HEIDI
Computerised Automatic Terminal Information Service

Contextual Condition
Cleared Level Adherence Monitoring
Central European Air Traffic Services
Central Flow Management Unit
Control and Monitoring System
Communication Navigation Surveillance
Contextual Control Model
Computerised Operator Reliability and Error Database
Common Performance Condition
Controller Pilot Data Link Communication
Common Performance Modes
CEATS Research, Development and Simulation
Cognitive Reliability and Error Analysis Method
Commercial Service
Controller Working Position
Direct Access Radar Channel
Departure Manager
Distance Measuring Equipment
European Aviation Safety Agency
European Civil Aviation Conference
European Cooperation for Space Standardisation
European Geostationary Navigation Overlay Service
Errors Of Commission
Errors of Ommission
Error Producing Condition
European Space Agency
EUROCONTROL SAfety Regulatory Requirements
Event Tree
European Union
European Organization for Safety of Air Navigation
Federal Aviation Administration
Future Navigation System
Flight Data Processing and Distribution
Flight Data Processing System
Flight Information Region
Flight Information Service
Flight Level
Failure Mode and Effect Analysis
Failure Modes, Effects, and Criticality Analysis
Flight Management System
Flight Plan Processing
Flight Progress Strips
Fault Tree
Gate to Gate
Ground-Ground communication
Global Orbiting Navigation Satellite System
Global Navigation Satellite Systems
Global Positioning System
Human Error Assessment and Reduction Technique
Harmonisation of European Incident Definition Initiative
xx
HEP
HFACS
HEP
HERA
HF
HF DL
HMI
HPDB
HRA
HRMS
IANS
IC
Ic
ICAO
IEC
IEEE
IFR
ILS
IMC
IMC
INS
IP
IRS
ISO
JAA
JAR
JHEDI
M
MAESTRO
MANTAS
MATS
MDT
MET
METAR
Mil
MLS
MMI
MMS
MONA
MORS
MRP
MSAW
MSL
MTBF
MTBM
MTCD
MTTR
MUAC
NATSPG
MTOW
Human Error Probability

Human Factors Analysis and Classification System
Human Error Probability
Human Error in ATM Project
High Frequency
High Frequency Data Link
Human Machine Interface
Human Performance DataBase
Human Reliability Assessment
Human Reliability Management System
Institute of Air Navigation Services
Intercom
recovery Context Indicator
International Civil Aviation Organization
International Electrotechnical Commission
Institute of Electrical and Electronics Engineers
Instrument Flight Rules
Instrument Landing System
Instrument Meteorological Conditions
Industry Management Committee
Inertial Navigation Systems
Interphone
Incident Reporting System
International Organisation for Standardisation
Joint Aviation Authority
Joint Aviation Regulations
Justification of Human Error Data Information
Mean
Means to Aid Expedition and Sequencing of Traffic with
Research and Optimisation
Maastricht ATC New Tools And Systems
Manual of Air Traffic Services
Mean Down Time
Meteorological
Meteorological Aerodrome Report
Military
Microwave Landing System
Man Machine Interface
Man Machine System
MONitoring Aids
Mandatory Occurrence Reporting Scheme
Multi Radar Processing
Minimum Safe Altitude Warning
Mean Sea Level
Mean Time Between Failure
Mean Time Between Maintenance
Medium Term Conflict Detection
Mean Time To Repair
Maastricht Upper Area Control Centre
North Atlantic Systems Planning Group
Maximum Take Off Weight
xxi
NARA
NAIPS
NAS
NASA
NATS
NUCLARR
NDB
NLR
NOTAM
NTL
NTSB
OJT
OLDI
OS
PABX
PAR
PARM
PPS
PRA
PRNAV
PRS
Proc
PRS
PSA
PSF
PSR
PTT
QRA
RAFT
RAM
RCP
RDP
RDPS
RDR
RGCSP
RIF
RIMCAS
RNP
RSP
RT
RTCA
RVSM
RVR
RWY
SAR
SAR
SAS
SATCOM
SHAPE
SBAS
SBJ
Nuclear Action Reliability Assessment

National Aeronautical Information Processing System
National Aviation System
National Aeronautics and Space Administration
National Air Traffic Service
Nuclear Computerise Library for Assessing Reactor Reliability
Non-Directional Beacon
National Aerospace Laboratory
Notice to Airmen
National Transportation Library
National Transportation Safety Board
On-the-Job-Training
On-line Data Interchange
Open Service
Private Automatic Branch Exchange
Precision Approach Radar
Parallel Approach Runway Monitor
Precise Positioning Service
Probabilistic Risk Assessment
Precision aRea NAVigation
Public Regulated Service
Procedural control
Primary Radar Service
Probabilistic Safety Assessment
Performance Shaping Factor
Primary Surveillance Radar
Press To Talk
Quantitative Risk Assessment
Recovery from Automation Failure Tool
Route Adherence Monitoring
Required Communication Performance
Radar Data Processing
Radar Data Processing System
Radar
Review of the General Concept of Separation Panel
Recovery Influencing Factor
Runway Incursion Monitoring and Conflict Alert System
Required Navigational Performance
Required Surveillance Performance
Radio Telephony
Radio Technical Commission for Aeronautics
Reduced Vertical Separation Minima
Runway Visual Range
Runway
Special Administrative Region
Search And Rescue
Situational Awareness for Safety
SATellite COMmunication
Solutions for Human Automation Partnership in European ATM
Satellite-Based Augmentation Systems
Supersonic Business Jet
xxii
SD
SE
SEP
SES
SID
SME
SMC
SMR
SNET
SoL
SOR
SPS
SRG
SRK
SRP
SRU
SSR
STAR
STCA
SUA
SYSCO
TACAN
THERP
TAR
TCAS
TID
TRACON
TIP
TLS
TRACEr
TRACON
TRUCE
TRM
TTA
TWR
TWY
UAV
UHF
UPS
US
UTC
VDL
VFR
VHF
VMC
VOR
VORTAC
VSCS
WAAS
Standard Deviation
Standard Error
Safety and Emergency Procedures
Single European Sky
Standard Instrument Departure
Subject Matter Expert
Surface Movement Control
Surface Movement Radar
Safety Nets
Safety-of-Life
Stimulus-Organism-Response
Standard Positioning Service
Safety Regulatory Group
Skill Rule Knowledge
Single Radar Processing
Safety Regulatory Unit
Secondary Surveillance Radar
Standard Terminal Arrival Route
Short Term Conflict Alert
Special Use Airspace
System Supported COordination
TACtical Air Navigation
Technique for Human Error Rate Prediction
Terminal Approach Radar
Traffic Alert and Collision Avoidance System
Touch Input Device
Terminal Radar Approach CONtrol
Touch Input Panels
Target Level of Safety
Technique for the Retrospective and Predictive Analysis of
Cognitive Errors in ATC
Terminal Radar Approach CONtrol
TRaining for Unusual Circumstances and Emergencies
Team Resource Management
Time To Alert
Aerodrome Control Tower
Taxiway
Unmanned Aerial Vehicles
Ultra High Frequency
Uninterruptible Power Supply
United States
Coordinated Universal Time
Very high frequency Data Link
Visual Flight Rules
Very High Frequency
Visual Meteorological Conditions
VHF Omnidirectional Range navigation system
VHF Omnidirectional Range /TACtical Air Navigation
Voice Switching Communication System
World Aircraft Accident Summary
xxiii
Chapter 1
Introduction
Introduction
The aim of this Chapter is to present the background to the problem of controller
recovery from equipment failures in Air Traffic Control (ATC) and to set the scene for
the research presented in this thesis. This Chapter defines the rationale behind the
need to better understand the impact that equipment failures have on controller
performance in the current as well as in the future ATC environment. Based on this
background, the principle research objectives are defined to assure an in depth
analysis of ATC equipment failures and controller recovery. This is followed by the
specification of the structure of the thesis and a summary of each Chapter.
1.1 Background to the problem

The aim of the research presented in this thesis is to provide a holistic assessment of
controller recovery from equipment failures in ATC. In order to achieve this, it is
essential to define the environment in which equipment failures are investigated, i.e.
the Air Traffic Management (ATM) system and its ATC component. While ATC is
responsible for the separation of air traffic, other components of the ATM system
manage air traffic flow and airspace design to assure minimal delays and optimal use
of airspace. The ATC system is comprised of people, equipment, and procedures
required to act together to achieve the same objective, i.e. safe and efficient flow of air
traffic in a dedicated airspace. In order to achieve this, all three components must be
operational and fully integrated to enable the most effective and efficient air traffic
service. Consequently, in the case of failure of any component of the ATC system, the
remaining nominally operational components may still provide air traffic services, either
partially or fully, depending on the characteristics of the failure. The research presented
in this thesis focuses solely on failures of one component of the ATC system, namely
equipment.
In order to provide continuous air traffic services various defences or barriers are
designed to prevent or mitigate the occurrence of equipment failures. For example, the
existence of technical built-in defences offers protection against the majority of
1
Chapter 1
Introduction
equipment failures that can occur (NATS, 2002). In most cases, this protection is
triggered automatically and seamlessly. Hence, an equipment failure should not result
in a problem that impacts on the controllers ability to carry out tasks safely, as they
should be automatically resolved with no interruption of the service (EUROCONTROL,
2004e). However, there are occasions when these technical defences are not sufficient
to maintain the normal ATC system state and protect against negative outcomes. On
such occasions, the intervention of the human, as a component of the ATC system, is
necessary. In other words, the intervention of the air traffic controller becomes crucial
for the provision of a safe but not necessarily efficient air traffic service. Note that
safety represents the key driver here as opposed to efficiency.
In the past, major failures or total outages (i.e. failure of the entire system) were the
subject of detailed investigations. These investigations were aimed at resolving and
preventing similar failure occurrences by focusing mostly on the technology (National
Transportation Safety Board, 1996; General Accounting Office, 1982; General
Accounting Office, 1991; General Accounting Office, 1996; and General Accounting
Office, 1998). For a long time, the basic focus of reliability, system safety, and quality
management was purely on the prevention of equipment failures or the reduction of
their reoccurrence. Various techniques have been developed to assess equipment
failures, their causes, consequences, and appropriate defences. For example, the US
Federal Aviation Administration (FAA) requests that the availability of the Voice
Switching Communication System (VSCS) on the level of the ATC Centre (facilitylevel1) should not be less than 0.9999999, including the backup VSCS (FAA, 1997). In
spite of the significant efforts, equipment failures still occur and every ATC system
eventually fails to perform its intended function or part thereof. On these unexpected
occasions, the recovery of the ATC system is left to the human operator to implement
an appropriate recovery strategy in both a timely and effective manner. While past
research focused on the technical aspects of the occurrence of equipment failures,
very little has been done on human factors, with a particular reference to controller
recovery from such failures. Some examples, such as research by Wickens et al.
(1998), Low and Donohoe (2001), and EUROCONTROL (2004e), are discussed in the
following paragraphs.
The facility-level availability is based on a 50-position system. According to the FAA, system
failure occurs when one or more critical functions are unavailable in more than 10 percent of the
positions.
Chapter 1
Introduction
There is a vast amount of Human Reliability Assessment (HRA) research on recovery

from human error in areas including the nuclear and chemical process industry.
However, this knowledge has not been fully exhausted in aviation. For example, Zapf
and Reason (1994), Kontogiannis (1999), Kanse and van der Schaaf (2000), and
Kanse (2004) analysed recovery from the consequences of human error in various
non-ATC environments. Moreover, past HRA research recognised the importance of
contextual factors that influence the recovery process. Various HRA techniques defined
these factors depending on the type of operation and environment that surrounds the
human operator. In short, the concepts of recovery from human error and recovery
context are transferable to the recovery from equipment failure. Both represent human
recovery triggered by different stimulus (human error as opposed to technical failure)
occurring within a certain context.
The above findings led to a significant research effort being devoted to the area of
human recovery, from both human error and technical faults. For example, research on
automation in future ATM has shown that human operators are less likely to detect
failures in the automated process due to complacency and reduced situational
awareness (Wickens et al., 1998; Metzger and Parasuraman, 2005). Researchers at
the UK National Air Traffic Service (NATS) examined the potential methodologies to
assess human recovery performance from failures of several automated systems (Low
and Donohoe, 2001). Several different safety (e.g. hazard and operability-HAZOP) and
psycho-physiological methods (e.g. eye movement tracking, situational awareness
assessment-SAGAT, subjective workload ratings-NASA TLX, speech workload) were
investigated. While some of these methods are quite easy to implement (e.g. HAZOP,
SAGAT, NASA TLX), others require complex training and the use of sophisticated
equipment (e.g. eye movement tracking, speech workload). Most of these methods
proved to be appropriate, providing useful information and were thus recommended for
future use. Due to the confidential nature of this research, no further insight was given
into the human recovery process, its phases, and the impact of the context surrounding
the controllers.
Furthermore, the EUROCONTROL Gate to Gate (G2G) project, initiated to test future
advanced ATC concepts, further highlighted the impact and importance of ATC
equipment failures. ATC safety managers throughout Europe highlighted several
equipment related areas of concern within their ATC Centres (Gordon and Makings,
2003). These are: radio communication interference, equipment reliability, ATC tools
failure, and relevance of emergency checklists for controllers and appropriate handling
Chapter 1
Introduction
of emergency situations. This study highlighted the consequences of equipment

unavailability in current as well as future more automated ATC environments.
Simulation trials that followed attempted to identify and investigate safety-relevant
occurrences associated with future ATC concepts/tools (Medium Term Conflict
Detection-MTCD, MONitoring Aid-MONA, data link, Arrival Manager-AMAN, and
Airborne Separation Assistance System-ASAS). Various equipment failures were
identified amongst the potential safety-relevant occurrences 2 . They ranged from
problems with Human Machine Interface (HMI), ASAS messages, as well as data link
messages (Damidau, Kirwan, and Scrivani, 2006).
However, not many studies have explicitly addressed jointly the question of equipment
failures and recovery in the area of ATC. The Panel on Human Factors in Air Traffic
Control Automation was formed at the request of the Federal Aviation Administration
(FAA) to study the air traffic control system, the national airspace system, and future
automation alternatives from a human factors perspective (Wickens et al., 1998). The
Panels deliberations, in particular, highlighted the role of reliability of automation and
human recovery in the future ATC environment, characterised with higher levels of
automation, complexity, and traffic density. Similarly, the EUROCONTROL project on
Solutions for Human Automation Partnership in European Air Traffic Management
(SHAPE) dedicated one part to the analysis of human recovery from equipment failures
in the automated ATC environment. The findings highlighted the importance of context
within which a failure occurs as well as recovery training and procedures designed to
aid recovering (EUROCONTROL, 2004e).
Overall, existing research has shown that there is a need to understand the
mechanisms behind failure and recovery in ATC. This applies both to the technical and
human perspectives as both are essential to ensuring the highest level of safety. In
order to develop a heuristic method to address these issues, it is necessary to define
the major research objectives. These are presented below.
1.2 Research objectives

The need for an in depth analysis of ATC equipment failures and the associated
controller recovery processes is presented briefly above and is discussed in more
Personal correspondence with EUROCONTROL G2G project team.
Chapter 1
Introduction
detail in the remainder of the thesis. Based on the background to the problem
presented above, four research objectives have been formulated:
Provide a systematic literature review to connect disparate but related topics of
ATC equipment failures and controller recovery, previously lacking in the area of
ATC;
Identify potential equipment failure types and their characteristics;
Identify contextual factors that affect controller recovery performance and derive
a methodology to quantitatively assess recovery context; and
Propose a framework for the analysis of controller recovery. This framework
should be further verified with a specific reference to a particular equipment
failure type.
1.3 Outline of the thesis

This thesis is organised as follows. Chapter 2 discusses the architecture of the Air
Traffic Management (ATM) system with specific attention paid to its Air Traffic Control
(ATC) component, to portray the context of the research presented in this thesis. The
ATC architecture is presented in terms of nine functionalities and the corresponding
physical architecture (equipment). In other words, it specifies nine ATC functionalities
and equipment that supports each of them. Chapter 3 presents a preliminary
assessment of the equipment failures in ATC based on the sample of operational
failure reports available in this research. It provides definitions of equipment failure,
hazards, and built-in technical defences to be used in the research on recovery from
equipment failures in ATC. The Chapter continues by assessing how representative is
the sample of equipment failures occurring in the operational ATC environment. This is
achieved though a methodology that determines how much ATC equipment contributes
to the safety of the overall air transport system.
Having confirmed that the operational failure reports available in this thesis are
representative of the equipment failure types experienced operationally, Chapter 4
provides a good understanding of equipment failures and their impact on the ATM and
ATC operations. It discusses the main equipment failure characteristics extracted from
available operational failure reports and past research. Assessed characteristics range
from the ATC functionality affected to the impact of equipment failure on ATC and ATM
operations. The Chapter concludes with the development of a novel tool for the
assessment of the overall impact of an equipment failure on ATC operations, known as
the qualitative equipment failure impact assessment tool.
5
Chapter 1
Introduction
Having established the framework for the assessment of equipment failures in

Chapters 3 and 4, Chapter 5 addresses the human factors aspects of relevance to
controller recovery performance in the event of an equipment failure. It discusses past
research on human reliability transferable to controller recovery performance. The
Chapter presents the initial theoretical findings on the recovery process, including the
relevance of the recovery context, past experience, recovery procedures, and recovery
training. It concludes by defining the potential variables that enable the assessment
and understanding of controller recovery performance.
The theoretical findings from Chapter 5 are further informed by the operational
experience extracted from the questionnaire survey results presented in Chapter 6.
This survey informed both the technical and human aspects of the research into
recovery from ATC equipment failures.
Having acknowledged the importance of recovery context both from past research
(Chapter 5) and operational experience (Chapter 6), this thesis continues by setting the
scene for the qualitative and quantitative assessment of the recovery context. Chapter
7 reviews past ATC and non-ATC research to extract the relevant factors important for
the definition of the context surrounding an ATC equipment failure occurrence. As a
result, this Chapter concludes with a set of 20 candidate Recovery Influencing Factors
(RIFs). Chapter 8 reviews relevant past research to further exploit the findings from
Chapter 7. It continues by defining the methodology for the quantitative assessment of
the recovery context and definition of the recovery context indicator.
To further verify this methodology proposed in Chapter 8, Chapter 9 presents the
design of an experiment carried out at a particular ATC Centre that involved exposing
30 operational controllers to an unexpected but complex equipment failure. This
particular equipment failure was carefully selected from several failure types based on
the findings in Chapters 4, 5, and 6. The analyses of the data collected on recovery
performance from this experiment are presented in Chapter 10. These analyses are
based on a set of variables that enable investigation of controller recovery as proposed
in Chapter 5. The thesis ends with Chapter 11 drawing together the conclusions
achieved throughout this research together with suggested areas for further research.
Figure 1-1 crystallises the overall structure of this thesis.
Chapter 1
Introduction
Figure 1-1 Overview of the thesis
Chapter 2
Fundamental of ATM and ATC
Fundamentals of Air Traffic Management and

Control
The main objective of the research presented in this thesis is to investigate the
recovery process adopted by air traffic controllers in the event of Air Traffic Control
(ATC) equipment failures. A desirable objective of the research in this thesis is a
framework to analyse controller recovery transferable in time (i.e. to the current and
future ATC Centre). The Chapter contributes to this objective in several ways. Firstly, it
defines the environment for the investigation of equipment failures, i.e. Air Traffic
Management (ATM) and its component ATC. Secondly, it discusses the ATC system
architecture including its specific functional elements. The Chapter proposes a unique
classification of equipment failures based on these functional elements that enables the
capture of all operational components of ATC. This classification is further built upon in
the remainder of the thesis (Chapter 4) to create a qualitative equipment failure impact
assessment tool. Thirdly, the Chapter reviews the characteristics of a generic ATC
Centre with regard to current and future technologies. The potential characteristics of
future ATC Centres are discussed with an emphasis on challenges that face human
operators (i.e. air traffic controllers) due to increasing levels of automation. The
Chapter concludes with discussions on the potential sources of technical and controller
performance deficiencies within future ATC Centres and their relevance to the recovery
process.
2.1 Air Traffic Management

The major components of the air transport system are aircraft, airline operations, ATM,
airport operations, and the operational environment in which these components exist
and interact (Figure 2-1). The objective of ATM is to enable aircraft operators to meet
their planned times of departure and arrival and adhere to their preferred flight profiles
with
minimum
constraints,
without
compromising
(EUROCONTROL, 2006a).
agreed
levels
of
safety
Chapter 2
Figure 2-1 Air transport system (from Subotic et al., 2005)
An ATM system comprises two functionally integrated elements, namely airborne ATM
and ground-based ATM. The airborne ATM consists of several systems integrated into
the aircraft cockpit, such as the airborne Communication/Navigation/Surveillance
(CNS) system, the Flight Management System (FMS), and the Airborne Collision
Avoidance System (ACAS) also known as the Traffic Alert and Collision Avoidance
System (TCAS). The components of ground-based ATM (Figure 2-1) are Airspace
Management (ASM), Air Traffic Service (ATS), and Air Traffic Flow Management
(ATFM) (ICAO, 2001a).
Airspace Management (ASM) is related to the structure and organisation of the national
airspace organised at a strategic (i.e. national ASM policy, planning, and coordination),
pre-tactical (i.e. daily management and temporary allocation of airspace), and tactical
levels (i.e. real-time activation, deactivation, reallocation of airspace, and civil/military
coordination). Air Traffic Service (ATS) is a generic term that combines various
services: the Air traffic services Reporting Office (ARO), the Air Traffic Control service
(ATC), and the Flight Information and alerting Service (FIS) (ICAO, 2001a). The ARO is
a unit established for the purpose of receiving reports concerning air traffic services
and flight plans submitted before flight departure. The ATC component of ATS provides
control of all air traffic in a dedicated airspace. This is discussed in detail in section 2.2
given its importance to the research presented in this thesis. The Flight Information and
alerting Service (FIS) gives advice and information useful for the safe and efficient
conduct of flights. The alerting service provides search and rescue assistance to
aircraft in distress and coordinates any action that may be required. Finally, Air Traffic
Flow Management (ATFM) is a service established to ensure that ATC capacity is
9
Chapter 2
utilised to the maximum extent possible, and that the traffic volumes are compatible
with the capacities declared by the appropriate authority. Optimal flow of traffic is
achieved by continuously balancing the traffic demand and the ability of ATC to
accommodate that demand.
2.2 Air Traffic Control

The research presented in this thesis is focused specifically on controller recovery from
equipment failures in Air Traffic Control (ATC). Therefore, this section focuses on the
main characteristics of ATC and the different services provided. Modern ATC services
are provided from ATC Centres by controllers and supporting staff (engineers,
managers, and administrators), working together to achieve the same objective. The
primary objective of an ATC service is to provide a safe flow of traffic both in the air and
on the ground (EUROCONTROL, 1999). In other words, the primary function is to
prevent collision between aircraft in the air as well as collision between aircraft and any
obstacles on the manoeuvring area, by providing and maintaining the required lateral
and vertical separations. The secondary function of an ATC service include ensuring
orderly and expeditious traffic flow by providing traffic advisories, such as weather
information and navigation directions (i.e. vectors). To achieve these functions, the
service is divided into sections that provide an ATC service to aircraft depending on the
segment of the flight profile, i.e. phase of flight (Figure 2-2). According to the
International Civil Aviation Organisation (ICAO)1, ATC provides area, approach, and
aerodrome control services. These are discussed in the following sections.
Figure 2-2 Flight profile (adapter from ICAO, 2001b)
ICAO is the specialised agency of the United Nations concerned with the development of air
navigation and regulation of international air transport.
10
Chapter 2
2.2.1 Area control service

The area control service is provided from an Area Control Centre (ACC), as defined by
ICAO. In the US, such a Center is referred to as an Air Route Traffic Control Centre
(ARTCC) as defined by the US Federal Aviation Administration (FAA). The controllers
at ACCs provide instructions, clearances, and advice regarding flight conditions during
the cruise phase of the flight (see Figure 2-2). The controllers provide separation
between aircraft operating in the complex network of airways (predetermined air
routes). The controllers use radar to monitor the progress of flights and intervene when
the route or flight level of an aircraft brings it into conflict with another. This is achieved
through tactical air traffic control interventions such as heading or track change, flight
level change, speed control, or alteration of flight routes. In areas where it is impossible
to provide a radar service (i.e. oceanic airspace and other regions without radar
coverage), the controllers employ procedural (i.e. non-radar) control to ensure that
adequate separation exists between aircraft. Procedural control employs greater
separation standards because of the absence of direct radar surveillance (Nolan, 1998;
EUROCONTROL, 1999).
An ACC is usually sub-divided into controlled airspace sectors2 that have responsibility
for specific portions of airspace. This is a direct result of the large volumes of air traffic
that utilise the airspace in the cruise phase of the flight. The greater airspace is
sectorised into smaller, more manageable parts in an effort to prevent controller
overload (i.e. when the traffic in a sector exceeds available airspace capacity or a
controller is unable to safely control existing levels of air traffic).
Generally, each ATC sector is manned by an executive and planning controller, where
each has clearly defined roles and responsibilities (EUROCONTROL, 1999). In the
case of high traffic complexity, two sector controllers are supported by a third person,
i.e. an assistant or a flight data controller. The executive controller is responsible for the
correct identification of traffic within the sectors area of responsibility and for the
control of all aircraft to ensure a safe, orderly, and expeditious flow of air traffic.
Additionally, the executive controller is required to assist pilots by providing required
navigation assistance and to assist aircraft in any emergency situation. The planning
controller assists the executive controller to the fullest extent by identifying traffic in
2
Airspace is organised into adjacent portions, the so-called sectors, controlled by two or three
controllers, namely executive or tactical controller, planning controller, and assistant or flight
data controller.
11
Chapter 2
potential conflict, managing flight progress strips, and planning the flow of traffic within
the sector. In addition, the planning controller has to assure that traffic enters and
leaves the sector at flight levels and exit points as agreed with the adjacent sectors
(EUROCONTROL, 1999). The assistant or flight data controller ensures that the strip
printer functions properly. In addition, the assistant accepts, processes all received
messages in a timely manner, and passes them to the appropriate position, manually
inputting any tracks for which flight progress strips have not been produced.
The controllers operating in the sectors within an ACC Centre work in close
cooperation and negotiate with each other on aircrafts behalf to optimise efficiency and
ensure safety. The area controllers responsibility terminates when aircraft is handed
over to an adjacent ACC or to an approach control office.
2.2.2 Approach control service

The approach control service is provided from the APProach control office or room
(APP), as defined by ICAO or Terminal Radar Approach CONtrol (TRACON), as
defined by the FAA. According to ICAO (2001a) the approach control unit is
established to provide air traffic control service to controlled flights arriving at, or
departing from, one or more airports. This service is closely associated with the
characteristics of the airports. The radar controllers in the approach control office
provide separation between aircraft in descent during the arrival phase, and, during the
departure phase, between aircraft climbing to their assigned cruise or intermediate
assigned levels (see Figure 2-2). Therefore, the approach controllers are responsible
for providing a safe and expeditious service to departing aircraft in the initial phase of
flight and to arriving aircraft in the descent and final phases of flight (Nolan, 1998;
EUROCONTROL, 1999). The approach controllers responsibility terminates when
departing aircraft is handed over to an ACC or when arriving aircraft has landed. Note
that APP is responsible for monitoring approaching aircraft, even after they are
transferred to aerodrome control tower, until they land.
2.2.3 Aerodrome control service

The aerodrome control service is provided from the Aerodrome Control Tower (TWR),
as defined by ICAO or Air Traffic Control Tower (ATCT), as defined by the FAA. The
aerodrome controllers are responsible for the safe and efficient conduct of flights during
the take-off and landing phases. These controllers direct airport traffic so that it flows
smoothly and expeditiously. Working closely with the approach controller, they ensure
safety of airport operations by restricting traffic movements so that only one aircraft
12
Chapter 2
may land or take-off at a time (Nolan, 1998; EUROCONTROL, 1999). In airports that
use multi-runway operations, the aerodrome controller may be responsible for all
runway operations. Otherwise, the responsibility for multi-runway operations may be
divided between a number of controllers. For example, a parallel runway configuration,
where one runway is dedicated to departures and the other to arrivals, requires
separate departure and arrival controller. In this case close cooperation between the
two controllers is essential to ensure a safe operation.
The aerodrome controller is responsible for all traffic operating in the designated area
of responsibility of the control tower. This includes aerodrome circuit traffic, aircraft
landing and taking off, and aircraft and vehicles operating on the manoeuvring areas
(ICAO, 2001a). When good visibility conditions prevail, (i.e. visual meteorological
conditions or VMC), the controller may separate the traffic by visual means and a
reduction in standard separation is permissible. When poor visibility conditions prevail
(i.e. instrument meteorological conditions or IMC) the aerodrome controller works in
close cooperation with the approach controller. In such conditions, prescribed
separation standards must be applied between aircraft in the air.
The surface movement control or ground control (in the US) is a supplementary service
to the aerodrome control service. In less busy airports the aerodrome and surface
movement control functions can be combined and provided by the aerodrome
controller. Otherwise, the surface controller is responsible for issuing taxi clearance
which will take all aircraft to the departure end of the runway (Nolan, 1998;
EUROCONTROL, 1999). In addition, the surface controller is responsible for the
movements of all aircraft and vehicular traffic on the manoeuvring areas of the airport.
ICAO (2001a) defines the manoeuvring areas as any part of the airport used for the
takeoff, landing, and taxiing of aircraft, excluding aprons. Surface movement control is
usually undertaken by visual means. However, in conditions of poor visibility the
controller relies upon surface movement radar (SMR). Working in close cooperation
with the aerodrome controller, the surface controller ensures that all active runways are
free from vehicular activity during aircraft movements.
2.3 Overall Air Traffic Control system architecture

The preceding paragraphs have highlighted the complexity of the ATM system and its
further decomposition down to the ATC system. Additionally, Figure 2-3 presents ATC
as a system comprised of people, equipment, and procedures integrated in an optimal
way to achieve a common objective. In order to understand how these components
13
Chapter 2
come together, a more detailed explanation of the ATC architecture and its basic
functionalities is given below. In line with the objectives of the research presented in
this thesis, this section provides a deeper understanding of ATC functionalities and the
types of ATC equipment that can fail, and therefore affect controller recovery.
ATM
Airspace
management
(ASM)
Flight Information
Service (FIS)
PEOPLE
Controllers
Engineers
Management
Ground-based
ATM
Airborne ATM
(e.g. airborne
CNS, FMC,
ACAS/TCAS)
Air Traffic
Services (ATS)
Air Traffic Flow

Management
(ATFM)
Air Traffic Control

(ATC)
EQUIPMENT
HMI
Hardware
Software
Air traffic services

Reporting Office
(ARO)
PROCEDURES &
TRAINING
Operational Procedures
Engineering Procedures
Figure 2-3 ATM and ATC system components (adapted from ICAO, 2001a)
The functional architecture of any system presents a high level decomposition of the
overall system into a logical set of functional blocks. Each block may be further
decomposed into a series of sub-functions. The ATC functionalities and their related
sub-functions, as presented in this thesis, include all those of the current ATM/ATC
system as well those under development for inclusion in the future (i.e. with 2020 taken
as the target year in this thesis in line with the European Commissions Vision 2020;
European Commission, 2001).
The starting point for the development of the ATC functional classification in this thesis
is the EUROCONTROL Harmonisation of European Incident Definition Initiative for
ATM (HEIDI) taxonomy. HEIDI taxonomy identifies six different ATC functionalities and
related ATC equipment that supports each of them. The functionalities listed in HEIDI
are: communication, surveillance, navigation, data processing and distribution, support
information functionality and power supply (EUROCONTROL, 2001e). This taxonomy
is subsequently expanded in this thesis by taking into account the needs for both the
classification and characteristics of the information derived from operational failure
reports processed. The analysis of operational failure reports highlighted the need for
nine ATC functional blocks. . The next set of layers dissects each ATC functional block
14
Chapter 2
into relevant sub-functions which are then dissected further to the elemental level. This
approach enables the capture of all operational components of ATC. The resulting nine
ATC functional blocks, as defined in this thesis, are:
Communication;
Navigation;
Surveillance;
Data processing and distribution;
Supporting;
Safety nets;
Power supply;
Pointing and data input; and
System monitoring and control.
Additionally, this classification is further built upon in Chapter 4. The following
paragraphs give a detailed description of each functionality and the corresponding
physical components (i.e. hardware components that support each function).
2.3.1 Air Traffic Control functionalities

2.3.1.1 Communication function
The scope of communication function covers the distribution of information to air- and
ground-based ATC system components in the form of voice, data, or both. This is
achieved using various communication methods. Currently, radio telephony (RT)
enables voice transfer of information via high frequencies (HF), very high frequencies
(VHF), and ultra-high frequencies (UHF). Controller-pilot data link communication
(CPDLC), as a concept currently used in Australasia and the Pacific, assumes transfer
of data based on high frequency data link (HF DL), very high frequency data link (VDL),
and satellite communication (SATCOM). In general, the communication function
provides connectivity and information transfer between users and providers that are
both internal and external to a particular ATC Centre. This function is supported by
various components (Figure 2-4) which are discussed in the following paragraphs. The
section concludes with a discussion of the future communication systems and the
concept of Required Communication Performance (RCP).
15
Chapter 2
Figure 2-4 Communication function
Firstly, the communication function is supported by a Voice Switching Communication

System (VSCS) presented on Controller Working Positions (CWPs) via the VSCS
panel. This is a computer-controlled switching system that facilitates both the air-toground (A/G) and ground-ground (G/G) communication necessary for ATC operations
(FAA, 1998). Controllers are able to use the VSCS for A/G communication by
accessing A/G transmitters and receivers through which they communicate with pilots
via HF, VHF, or UHF. The VSCS also ensures that incoming A/G communications from
pilots are routed to the appropriate control position. Controllers are able to use the
VSCS for G/G communication via intercom, interphone, and external circuits. Intercom
enables controllers to access other control positions or ancillary positions located within
the operational room. Interphone enables controllers to access positions located within
another ATC/ATM facility. Finally, external circuits of VSCS enable controllers to
access the public telephone network (FAA, 1998).
Secondly, data is exchanged with adjacent ATC Centres via the Aeronautical Fixed
Telecommunication Network (AFTN), On-line Data Exchange (OLDI) automated
protocols, and ICAO data interchange network, using both public and private telephone
networks. AFTN, administered by ICAO, is the means by which all information
concerning national and international air operations are exchanged. The data consists
of messages on aircraft movements, conditions of airports, weather, and other
information related to ATC. OLDI refers to operational use of connections between
various Flight Data Processing Systems (FDPS) at different Area Control Centres
(ACCs). Public and private telephone networks are used to communicate data on
individual flights between ATC Centres along the route of the flight. The data that is
16
Chapter 2
exchanged includes flight level information, airspace boundary estimates of flights, and
other conditions that may be agreed between ATC Centres. This category incorporates
both systems for data exchange and any supporting equipment (e.g. AFTN printer,
console).
Thirdly, the Aeronautical Information System (AIS) provides information of a permanent
or semi-permanent nature on subjects such as geographical description of airspace, inflight procedures, sector procedures, communications data, surveillance data, and
specific airport characteristics data, either verbally or via datalink. In addition, local ATC
units provide a dynamic broadcast of relevant information to arriving and departing
pilots in the vicinity of the airport is known as Aerodrome Terminal Information Service
(ATIS). This service uses local weather data (from the meteorological office) and AIS
data (e.g. runway and taxiway conditions, navigational aids status).
Fourthly, backup radio and telephone systems must be provided. These backup
systems may provide identical functionality if it is a duplicated VSCS system. However,
in some cases, redundancy can be provided by similar but not identical systems which
cannot offer identical functionality. In these cases it is essential that controllers are
aware of these differences. Backup communication systems must be capable of
providing continuity of communication during outages (complete loss of the
communications at the level of an ATC Centre), as voice communication continues to
be the primary means of communicating ATC instructions to aircraft.
Finally, several other physical components are listed which have a role in providing the
overall communications function. These include but are not limited to pagers, headsets,
handsets, microphones, processors, press-to-talk buttons (PTT), buzzers, cables, and
footswitches.
The previous discussion has focused on current systems that support the
communication function. Current communication methods are mostly based on
analogue voice communication that pose various limitations to the users (e.g. limited
coverage, accessibility, capability, integrity, and security). Moreover, the combination of
these limitations with current Radio Telephony (RT) procedures is linked to excessive
levels of controller workload (see Figure 21 in EUROCONTROL, 2004g). As a result,
future development of air navigation for civil aviation aims toward enhanced
communication links between aircraft and controllers. This was an important element of
the ICAOs Future Navigation Systems - FANS concept (ICAO, 2007). With respect to
17
Chapter 2
communication, a major development has been the advent of the Required

Communications Performance (RCP) concept. This concept characterises the
performance requirements for communications with no specific reference to
technology. Hence, the concept allows various technologies to be evaluated in terms of
communication process time (i.e. delay), integrity, availability, and continuity of function
(NASA, 2000). Until 2015, it is anticipated that the voice communication function will be
supported by a very high frequency data link (VDL) in addition to existing analogue
voice channels. In general, voice communication will be used for real-time, time-critical,
and non-routine messages (i.e. radar vectoring to avoid traffic). All other, more routine
communications will be served via data communication supported by VDL and satellite
communication (SATCOM) (NASA, 2000). The use of enhanced modes of data link will
enable several advanced features. Firstly, it will bring automatic data entry capabilities
while reducing time spent on manual data entry and potential for data entry errors.
Secondly, it will permit a significant reduction in transmission time and thus reduce RT
frequency congestion. Finally, it will eliminate misunderstandings as a result of
broadcasting problems and language issues. As a result, communication in the 2020
time frame is expected to be characterised by a mix of analogue voice and digital
communication with increased use of datalink to complement or replace existing
analogue voice communications.
2.3.1.2 Navigation function
The main objective of the navigation function within air traffic control (ATC) is to provide
aircraft with the means to navigate between the point of departure and the point of
arrival, i.e. to accurately and reliably determine their position during all phases of flight.
The quality of required navigational information (e.g. accuracy and integrity of aircraft
position) differs based upon the phase of flight. For example, the requirements in the
landing phase of the flight are the most stringent due to proximity to the ground and
high speed of aircraft, leaving little time to pilot to take corrective action. The navigation
function block, as shown in Figure 2-5, focuses on three components, namely
approach and landing navigation systems, area navigation systems, and systems for
control and monitoring of ground-based airport facilities. These are explained in the
following sections, concluding with a discussion of the concept of Required Navigation
Performance (RNP).
18
Chapter 2
Figure 2-5 Navigational function
2.3.1.2.1 Approach and landing navigation

This category within the navigation function consists of the systems that provide
precise guidance to an aircraft approaching a runway. The most widespread approach
aid is the Instrument Landing System (ILS) used for the most critical phases of the
flight, i.e. approach and landing. This system provides the pilot with both runway
centreline azimuth guidance (provided by an ILS localiser) and descent rate guidance
(provided by ILS glide slope) along the approach path of an aircraft. It allows pilots to
conduct the final approach and land safely even in conditions of poor visibility.
Previously, a Microwave Landing System (MLS) was supported by ICAO in areas
where it offered operational and economic advantages (e.g. increased runway
throughput/capacity). However, in this domain much more emphasis is now put on
evaluation of satellite navigation techniques and the necessary augmentations to
support precision landing with the long term objective of replacing the ILS system
(Aviation International News, 2001).
2.3.1.2.2 Area navigation
aRea NAVigation (RNAV) is a method of navigation that enables aircraft to fly any
chosen direct course within a network of navigation beacons, rather than navigating
directly to and from the individual beacons (EUROCONTROL, 2003h). Navigation
systems which provide RNAV capability include VHF Omni-directional Range/ Distance
19
Chapter 2
Measuring Equipment (VOR/DME), DME/DME, Non-Directional Beacon (NDB), selfcontained Inertial Navigation Systems (INS), and Global Positioning System (GPS).
Currently, area navigation is primarily supported by ground-based systems. Most
widespread is the VOR which provides a radial or bearing on which aircraft fly from one
VOR station to another (EUROCONTROL, 2003g). This aid is usually combined with
DME providing information on the distance of the aircraft from the VOR/DME beacon.
Therefore, any aircraft utilising this facility, can determine its position in terms of
bearing and distance relative to the location of the VOR station. The VOR/DME
combination represents the primary ground based aid for area navigation. Generally,
the maximum range of VOR stations is in the region of 250nm due to the line-of-sight
nature of VHF signals and the curvature of the Earth (EUROCONTROL, 2003g). Each
air navigational service provider publishes the effective range of their VOR stations.
Another system that uses a radio beacon is a NDB. It consists of two components, the
Automatic Direction Finder (ADF) which represents the airborne component and the
NDB's transmitting unit which is the ground component. The NDB beacon broadcasts
continuously on a specific frequency. An ADF on the aircraft detects specific bearing to
or from an NDB unit and thus determines its position relative to the NDB beacon. A
NDB bearing is a line passing through the station that points in a specific direction (e.g.
270 degrees west). This system may also be coupled with a DME. Although widely
used in the approach environment, it is less accurate and less reliable than VOR/DME
since it is susceptible to interference from thunderstorms and other atmospheric
phenomena. The power output determines the maximum range of the NDB beacon but
generally they are usable in the range of 50-100 Nm (EUROCONTROL, 2003g).
An INS is a completely self-contained navigational system located on board the aircraft
and independent of ground-based navigation aids. The basic INS consists of three
mutually orthogonal gyroscopes, three mutually orthogonal accelerometers, a
navigation computer, and a clock (EUROCONTROL, 2003g). Gyroscopes are
instruments that provide the orientation of an object (e.g. aircrafts angles of roll, pitch,
and yaw). Accelerometers sense a rate of movement or acceleration along a given
axis.
The
orthogonal
accelerometer
configuration
provides
three
orthogonal
acceleration components. Combination of the gyroscope orientation information with

the summed accelerometer outputs yields the total acceleration in three-dimensional
airspace. A navigation computer then time integrates the total acceleration to get the
aircraft's velocity vector. This velocity vector is further time integrated, yielding the
20
Chapter 2
position vector of aircraft. These steps are continuously iterated throughout the
duration of the flight. Based on all of the data, the INS system determines the aircrafts
position relative to a known point of departure (i.e. latitude and longitude coordinates of
the departure gate).
In recent years, Global Navigation Satellite Systems (GNSS) are being slowly
introduced where appropriate and cost effective. Two GNSS systems are currently in
operation: the United States GPS and the Russian Federations GLObal NAvigation
Satellite System (GLONASS)3. A third, the European Galileo system, is scheduled to
become operational in 2010. Each of the GNSS systems uses a constellation of
orbiting satellites working in conjunction with a network of ground stations. The GPS
system is available for civil use based on 24 operational satellites. Two distinctive GPS
services are available, namely the Standard Positioning Service (SPS) and the more
accurate Precise Positioning Service (PPS). The SPS is available to the civil users
worldwide without charge or restriction, while the PPS is available primarily to the
military. The SPS requirements are defined through the service availability standard of
more than 99% of time at an average location, with an average accuracy of 34m
horizontal and 77m vertical (95% threshold) (Department of Defence, 2001; European
Commission, 2006a). Similar standards are defined for the Galileo system, where five
distinctive navigation services will be available namely Open Service (OS), Safety-ofLife service (SoL), Commercial Service (CS), Public Regulated Service (PRS), and
Search And Rescue service (SAR) (European Commission, 2006b). The SoL service is
intended primarily for aircraft navigation. Service performance requirements for SoL
with dual frequency correction are set to be 4m horizontally and 8m vertically (95%
threshold) (European Commission, 2006b).
In recent years, additionally to the concept and supporting systems for area navigation,
a new concept referred to as Precision aRea NAVigation (PRNAV) has emerged.
PRNAV has been introduced to allow consistent terminal airspace operations in the
European region (i.e. European Civil Aviation Conference ECAC member states).
This is based on the navigation requirements that procedures, design principles, and
aircraft capabilities should meet the accuracy of 1 Nm for at least 95% of the flight
time (EUROCONTROL, 2006b).

Navigatsionnaya Sputnikovaya Sistema.
21
()
or
Global'naya
Chapter 2
2.3.1.2.3 Systems for control and monitoring of ground-based airport

facilities
In addition to all systems previously discussed, the navigation functional block also
includes systems for monitoring and control of ground-based airport facilities. Typically
monitoring and control of ground-based airport facilities is physically provided via
control desk with an interface panel designed to represent the airport facilities and
lighting services at a suitable scale (EUROCONTROL, 2003a). This component of the
navigation functional block supports but is not limited to the following elements:
navigational aids status, Aeronautical Ground Lighting (AGL) system (e.g. status of
runway, taxiway lighting panel), warning systems (e.g. runway in use), internal lighting,
meteorological equipment status, and alarming and reporting systems.
Finally, future development of air navigation for civil aviation aims toward enabling
aircraft navigation in four-dimensions seamlessly and gate-to-gate. The post FANS
Required Navigation Performance (RNP) concept is intended to characterise airspace
through a statement of the navigation performance accuracy (RNP type) to be
achieved (Jeppesen, 2001). In addition, the RNP-RNAV concept has emerged to
overcome the lack of harmonisation between the different RNP/RNAV naming
conventions and to enable common understanding of the relationship between RNP
and RNAV system functionality (ICAO, 2006a). The enhanced navigation, landing, and
surface movement service will be predominantly provided by the satellite-based
systems including the various augmentations such as Satellite-Based Augmentation
Systems (SBAS) and Ground-Based Augmentation Systems (GBAS). Surface
movements in all weather operations will be assisted with enhanced vision systems
enabling aircraft to see the airport surface in reduced visibility conditions. As a result,
navigation in the 2020 time frame is expected to be characterised by a mix of groundand satellite-based systems with increased functionality complementing or replacing
the existing ground-based systems (VOR, NDB, DME).
2.3.1.3 Surveillance function
The ATC surveillance function identifies all aircraft and presents their position on a
radar screen. Additional dynamic information on the aircraft is also provided depending
on the type of radar employed. The surveillance function block, as shown in Figure 2-6,
focuses on radars, radar and auxiliary display, and radars used predominantly for the
22
Chapter 2
terminal and ground surveillance4. The section concludes with a discussion of the
concept of Required Surveillance Performance (RSP).
Surveillance
Primary Radar
SSR Mode A/
C/S
Display
Surface
Movement Radar
Aux Display
Parallel
Approach
Runway Monitor
Terminal
Approach Radar
Automatic Dependent
Surveillance (ADS)
Precision
Approach Radar
Aerodrome
Traffic Monitor
Figure 2-6 Surveillance function
2.3.1.3.1 Radar systems

Basically there are two types of radar. The Primary Surveillance Radar (PSR) is the
most basic form of radar which transmits a pulsed beam of ultrahigh frequency radio
waves through 360 degrees via a rotating radar head (EUROCONTROL, 1999). When
the waves reach the aircraft, some of the energy is reflected back. Every time the
aircraft reflects the transmitted energy it will be displayed on the radar screen, thus
plotting the course of the aircraft. The PSR only displays an aircraft track or course and
does not provide any other dynamic flight data. This form of radar is rarely used for
commercial aviation except in underdeveloped regions or as a back up to secondary
surveillance radar.
Secondary surveillance radar (SSR) is a more sophisticated form of radar which does
not rely on reflected radio waves. SSR transmits electromagnetic waves in the form of
pulses through 360 degrees (EUROCONTROL, 1999). These pulses are received by
4
The primary difference between enroute radars and those used in the terminal and ground
surveillance is the rate of radar information update (e.g. enroute radars update every 8s, whilst
terminal radars update every 5s; EUROCONTROL, 1997).
23
Chapter 2
equipment on board the aircraft known as a transponder. The radar pulses interrogate
the transponder and if the transponder recognises the pulses it will respond by
transmitting back to the radar. Recognition is achieved by a discrete four digit code
assigned by ATC. When the transponder transmits to the radar, it actually transmits
essential data about the flight such as aircraft identification (known as Mode A) and
altitude (known as Mode C). As a result, the combination of the PSR and SSR Modes
A and C or SSR alone provides a three dimensional representation of the traffic. In
addition to this information, Mode S possess a data link functionality and access to
aircraft state vector (ground speed, track angle, turn rate, roll angle, climb rate,
magnetic heading, indicated air speed, mach number) as well as aircraft intent
information or indication of the future path (UK CAA, 2004).
A new surveillance initiative is directed toward the development of Automatic
Dependent Surveillance Broadcast (ADS-B) technology. This is a satellite-based
surveillance system that enables a constellation of satellites to determine the aircrafts
position, altitude, velocity, and other parameters (CASA, 2006). The data is broadcast
to all possible recipients in contrast to Automatic Dependent Surveillance Contract
(ADS-C), where only point to point data transfer is established. As a result, surveillance
in the 2020 time frame is expected to be characterised by a mix of airborne (ADS,
ADS-B,
ADS-C)
and
ground-based
functions
with
increased
functionality
complementing or replacing the existing ground-based systems (PSR and SSR).

2.3.1.3.2 Radar and auxiliary display
All surveillance information is presented to controllers on the Human Machine Interface
(HMI) commonly known as air situational display or radar display. Therefore, this
component of surveillance function block includes both radar and auxiliary displays.
Auxiliary display acts as a support providing data such as flight plan data, traffic lists,
and static and dynamic aeronautical data (e.g. notification to airmen - NOTAMs,
meteorological messages, and airport related information).
2.3.1.3.3 Terminal and ground surveillance
The surveillance functional block also incorporates radar systems which are relevant to
terminal and ground surveillance (Figure 2-6). These are Surface Movement Radar
(SMR), Parallel Approach Runway Monitor (PARM), Terminal Approach Radar (TAR),
Precision Approach Radar (PAR), and Aerodrome Traffic Monitor (ATM).
24
Chapter 2
Finally, future development of air navigation for civil aviation is focused on increased
accuracy of the aircraft position by integrating data from all available sources, such as
primary and secondary surveillance signals and Automatic Dependence Surveillance
Broadcast - ADS-B (Mohleji, Lacher, and Ostwald, 2003). The Required Surveillance
Performance (RSP) defines the surveillance requirements according to the airspace
involved (e.g. oceanic/remote airspace vs. high density traffic airspace). In addition, the
ADS system will enable merging of communications, navigation, and surveillance
technologies. This will accelerate the movement toward Airborne Surveillance and
Separation Assurance (ASAS). In other words, the future surveillance technologies
(e.g. ADS) will enable pilots to participate actively in the process of safely separating
their flight from other flights. This will be achieved by the display of traffic information
within the cockpit, wake vortex hazard prediction and avoidance, three dimensional
terrain presentation, terrain avoidance system, and weather awareness (Ochieng,
2006). Moreover, the US FAA is developing a concept of Situational Awareness for
Safety (SAS). The SAS concept is based on the use of available data (e.g. satellitebased position data, terrain, weather) and their exchange between all parties involved
(e.g. pilots, dispatchers, controllers). The primary objective of the SAS concept is to
create an environment promoting more efficient, safe, and free use of airspace (FAA,
1995).
2.3.1.4 Data processing and distribution function
The data processing and distribution function incorporates all systems required to
process flight related data (e.g. initial flight plan data, dynamic communication,
navigation, and surveillance flight data). These include the Flight Data Processing
System (FDPS) as well as the Radar Data Processing System (RDPS) enabling
controllers to 'see' in real-time the movement of aircraft in a dedicated airspace, as
represented on radar display. In addition, this function block also incorporates all
supporting equipment, such as strip printer (Figure 2-7).
25
Chapter 2
Data Processing and

Distribution
Fallback Flight
Data Processing
System
Flight Data
Processing
System
Radar Data
Processing
System
Flight plan processing

Airspace data processing
Flight data management
& distribution
SSR management
MTCD
Trajectory prediction
MAESTRO
Single Radar
Processing
Supporting
equipment
Fallback Radar
Data Processing
System
Multiple Radar
Processing
Figure 2-7 Data processing and distribution function
The FDPS handles flight plans and updates them through automatic events, manual
inputs, and triggered transitions from one state to another. This life of a flight plan
represents the condition of the flight plan at a specific time in its cycle. The phase of
the flight plan life cycle triggers certain system actions and directly affects what actions
the controller can take on the flight plan and therefore the actual flight. Through the
processing of flight progress strip (either manually or electronically), the controller
manages all traffic by interacting with flight related data (on the radar and auxiliary
display, and strip management board). The FDPS carries out the following specific
processes (EUROCONTROL, 2003a):
initial flight plan processing which includes checking incoming flight plan
messages, creating a record of flight data, and storing it in the flight plan
database. In addition, the FDPS handles flight data throughout the life of the
flight plan by constantly updating and distributing the flight data;
airspace data processing and distribution which handles the complete airspace
information (e.g. airways and navigation beacons). In addition, it processes any
information on the special use of airspace to warn the controller about
infringements which require modification of flight trajectory;
meteorological data processing and distribution;
SSR code management which involves the assignment of SSR code to flights
and identification of all flights by SSR mode A. It also prevents assignment of
duplicate codes;
trajectory prediction which is performed throughout the flight plan life cycle, taking
into account the initial flight plan as well as all modifications of the route;
26
Chapter 2
provision of system supported coordination and transfer of control within the ATC
Centre and between adjacent ATC Centres;
processing of data link messages from/to the aircraft (A/G coordination);
flight plan conflict detection which is performed inside a defined region (i.e.
sector) using flight plan data. This function is known as Medium Term Conflict
Detection (MTCD);
workload monitoring and distribution essential for assisting the supervisor in the
adjustment of the existing sectorisation (i.e. collapse/de-collapse of sectors) and
computation of position/sector load;
arrival sequencing which provides the approach and en-route controllers with a
proposed sequence number for each arrival flight; and
establishment of code/callsign correlation as a mapping between radar tracks
and flight plan database.
A flight progress strip is a tool that controllers use to record the progress of each flight
as it moves through the sector. It represents a record of all ATC instructions given to
each aircraft. It is also used as a back up to the surveillance function in the event of a
failure. The flight strip printer facility, as an additional component in this functional
block, supports the printing of flight strips at the executive, planner, and/or flight data
assistant positions, depending on the suite configuration. This facility automates the
previous manual filling of a flight strip through access to a database of flight information
and a printout of the data when needed. The printed strip displays the non-dynamic
aspects of the flight, necessitating only tactical dynamic instructions to be manually
entered on the strip by the controller.
The RDPS processes radar pictures from all available sources (primary and secondary,
short range and long range, en-route and approach radars) to establish an accurate
picture of all traffic over a well-defined geographical area. In the case of multiple radar
coverage, the RDPS provides a composite air picture of the traffic while taking into
account radar biases for range and azimuth measurements (EUROCONTROL, 2003a).
The ATM surveillance tracker and server system (ARTAS) processes PSR, SSR, Mode
S, and ADS data. These highly accurate and reliable data are directly integrated into
the existing ATC environment by using a universal data exchange format. For example,
EUROCONTROL defined the All Purpose STructured Eurocontrol Radar Information
Exchange (ASTERIX) messaging format. This allows the transfer of information
between two parties (e.g. systems) using a mutually agreed format of data.
27
Chapter 2
The data processing and distribution functional block also incorporates both a fallback
flight data processing system and fallback radar data processing system, as necessary
redundant systems in every ATC Centre. These fallback systems may provide identical
functionality if they are duplicates of the FDPS and RDPS systems. However, in some
cases these fallback systems do not necessarily provide the same range of functions
as the main systems. The necessity of redundant systems in ATC is discussed further
in Chapter 4.
2.3.1.5 Supporting function
The supporting function comprises various ATC tools that enable integrated air traffic
management operations that enhance safety and increase airspace capacity. The main
objective of these tools is to lessen the cognitive workload on the controller while
focusing on the relevant (task specific) information (IFATCA, 2004). They also assist in
the detection and resolution of potential problems. It is important to note that these
tools do not replace the need for controller decision making processes, they simply aid
them. The supporting function includes the following tools (Figure 2-8):
Monitoring tools assist with detection and recording of any safety-related events
(e.g. the Automatic Safety Monitoring Tool ASMT), reduce the workload
associated with traffic monitoring tasks by identifying the potential and actual
deviations or non-conformance with the planned flight trajectory (e.g. MONitoring
Aid MONA), and automatically check if aircraft are adhering to their planned
route (e.g. Route Adherence Monitoring RAM) or cleared flight level (e.g.
Cleared Level Adherence Monitoring CLAM) by comparing planned or
cleared information with the aircraft actual position (EUROCONTROL, 2001f);
The Medium Term Conflict Detection (MTCD) system is a tool which enables
controllers to predict and identify future conflict between aircraft in the predefined
region by applying separation rules (EUROCONTROL, 2001f); and
Sequencing managers (e.g. Arrival Manager - AMAN, Departure Manager DMAN, Means to Aid Expedition and Sequencing of Traffic with Research and
Optimisation - MAESTRO) are decision making tools for providing the approach
and en-route controllers with the control and sequencing actions to properly
expedite traffic to the destination airports and runways (EUROCONTROL, 2001f).
28
Chapter 2
Figure 2-8 Supporting function
These tools aim to enhance the controllers appreciation of the current and predicted
traffic situation and facilitate the decision making process. They are an integral part of
the HMI (i.e. radar display) and are informed by the output of the data processing and
distribution function.
2.3.1.6 Safety Nets
A safety net (SNET) is an airborne and/or ground-based function informing the pilot or
controller to the imminent possibility of collision between aircraft, between aircraft and
terrain/obstacles, as well as penetration of dangerous airspace (IFATCA, 2004). The
most common safety nets are Short Term Conflict Detection (STCA), Minimum Safe
Altitude Warnings (MSAW), Area Proximity Warnings (APW), and Runway Incursion
Monitoring and Conflict Alert System (RIMCAS).
The previous section described medium term conflict detection (MTCD) as an ATC tool
which assists the controllers in early detection and prediction of conflicts (e.g. 20
minutes in advance). Similarly, the STCA function detects two system tracks predicted
to be in conflict (i.e. two tracks where both horizontal and vertical separations are about
to be compromised). This system then alerts the controller to the imminence of a
separation minima infringement through the display of visual alarms presented on the
affected traffic on the HMI. However, whilst MTCD is for early detection and prediction
of conflicts, the STCA is used as a safety net or defence against imminent conflict
(EUROCONTROL, 2007a). The exact moment of STCA alarm depends upon
29
Chapter 2
predetermined settings (usually it is set to trigger the alert between 90 seconds and two
minutes prior to conflict).
The MSAW function enables detection of a radar track predicted to infringe the
minimum safe altitude above an obstacle. MSAW processing takes into account the
track altitude (i.e. altitude of the track extracted from Mode C or present altitude
corrected for pressure at mean sea level known as QNH pressure, thus providing the
altitude above mean sea level), attitude indicator (i.e. climb or descent), position and
speed vector. In addition, the system will detect if a radar track is predicted to deviate
from the approach path of an airport (EUROCONTROL, 2007a).
The APW is used to designate areas which are dangerous for an aircraft to enter (e.g.
missile firing, military training, and air display areas). These areas can be identified as:
prohibited, restricted, dangerous, military training, segregated, special use, temporary
restricted, and permanently restricted. The APW ensures that any aircraft infringing or
predicted to infringe on one of these areas is detected by this system and an advance
warning is presented to the controllers (EUROCONTROL, 2007a).
RIMCAS is an airport monitoring and conflict alert system which detects and alerts
controllers before a runway incursion is about to occur. The system gives the controller
an opportunity to react within a realistic and effective timeframe. This system is also
known as the ground short term conflict alert system. The main requirement of this
system is to be supplied with reliable surveillance data as any false alert unnecessarily
increases controller workload. As a result, the Automatic Dependent Surveillance
Broadcast (ADS-B) system should enhance surveillance capability for airport
monitoring and conflict prevention through the Advanced Surface Movement Guidance
and Control Systems (ASMGCS) (ICAO, 2005).
2.3.1.7 Power supply
The availability of electrical power is a prerequisite in a computer driven environment,
such as an ATC Centre. Electrical power is obtained from public utilities, but in case of
interruptions or non-availability, the ATC Centre's own installations are required to
provide electrical power. This is most commonly achieved by diesel-powered
generators or powerful batteries, supporting an Uninterrupted Power Supply (UPS)
capability. These components are required to provide uninterrupted electrical power
supply in order to prevent computers shutting down.
30
Chapter 2
2.3.1.8 Pointing and input devices

The Human Machine Interface (HMI) represents the entire ATC system to the controller
on each Controller Working Position (CWP). In order to interact with available systems,
the controller uses input and pointing devices. Input devices include Touch Input
Panels (TIP), the mouse, and keyboard. However the most frequent pointing devices
are the mouse and trackerball. Using the input and pointing devices, the controller
communicates with the entire ATC system, and edits and reads live flight plans. All
the changes and interactions made by controllers via input and pointing devices are
presented on displays (i.e. radar, auxiliary display, and communication panel).
2.3.1.9 System control and monitoring function
This function is supported by a computer and monitor system that controls the overall
ATC system from a centralised position, i.e. the system control and monitoring unit.
The main purpose of this system is to display the actual state of the core systems and
subsystems within the CNS/ATM infrastructure, to manage incidents, and to perform
the reconfiguration of resources within its infrastructure. This functional block
constantly checks the functionality of the overall system, involving the software and
hardware configuration in order to ensure a high system availability (EUROCONTROL,
2003a). The system monitoring and control functionality is supported by several
different facilities which are explained in the following paragraphs (Figure 2-9).
Figure 2-9 System monitoring and control function
The data recording and playback facility enables automatic recording of all transactions
made by the radar data, flight data, radar display, and communication functions. This
includes all controllers modifications to flight plans, received messages, and display
setting modifications (EUROCONTROL, 2003a). The recorded data are used for further
data analysis and for playback of the specific air traffic situation (i.e. in the case of an
31
Chapter 2
incident). The recordings are stored on disks for the time deemed necessary by the
relevant aviation authority (the legal requirement is 30 days but could be longer if
necessary for incident investigation).
One of the most requested system control and monitoring functions is the ability to
detect faults in the supervised ATC system by continuous control and monitoring of the
system operation. This facility provides detailed information on the equipment states
within the managed systems and the relevant alarm conditions which may affect the
operating mode. It also logs events and enables the remote control of supervised
equipment and setting of the system thresholds (EUROCONTROL, 2003a). Its main
sub-functions are: fault management (i.e. alarm management, threshold setting),
configuration management (i.e. equipment descriptions), performance management
(i.e. identification of trends and problems), and security management (i.e.
authentication, identification, password protection, tailored user interface). The control
and monitoring is performed on all positions, external lines, and connections.
Each ATC system is designed to have several operational system modes
(EUROCONTROL, 2003a). These modes automatically switch-in if any of the major
processing systems fail. The objective is that the controller always has some
functionality available despite the degradation of equipment. Reduced radar, alert, flight
plan, and communication modes are the most frequent types of reduced operational
modes available in current ATC systems.
The time management facility uses the external time received from the GPS signal for
synchronising time on all computers (i.e. all Controller Working Positions - CWPs). The
time is expressed in Coordinated Universal Time (UTC), also known as zulu time.
Originally, it was a time scale based on the local standard time on the 0 longitude
meridian which runs through Greenwich, United Kingdom. Today, UTC uses precise
atomic clocks and satellites to ensure a reliable and accurate time standard for air and
ground operations (ICAO, 1979).
2.4 Characteristics of the generic Air Traffic Control Centre

The preceding paragraphs presented the architecture (functional and physical) of an
Air Traffic Control (ATC) system. However, a more complete understanding of the ATC
system (i.e. people, equipment, procedures) is possible within the context of an ATC
Centre providing specific types of services. Therefore, this section reviews the main
characteristics of a generic ATC Centre with particular focus on current technologies.
32
Chapter 2
The following section focuses on technologies that will determine the characteristics of
the generic ATC Centre in the future.
There are significant variations in equipment between ATC Centres, both in Europe
and worldwide. On the European level, EUROCONTROL, the European Organisation
for Safety of Air Navigation, took the role of promoting the harmonisation, integration,
and standardisation while improving safety and overall performance of the ATM/ATC
systems in its member states. For example, EUROCONTROL (2006d) has considered
the costs of fragmentation of the EUROPEAN ATM system. At a global level, ICAO
standardisation activities are undertaken when new systems or technologies are
mature, have demonstrated their ability to provide safety enhancements compared to
existing systems, and are cost beneficial to international civil aviation (ICAO, 2003).
ICAO has established standards and recommended practices for all of its contracting
states (ICAO, 2006b).
In spite of the significant effort to date to standardise ATM/ATC within the aviation
community, there are still significant differences. For this reason, the methodology
adopted in this thesis for the assessment of controller recovery from equipment failures
in ATC is designed on the basis of a generic ATC Centre. This is defined below.
The ATC Centre should be based on a fully automated and integrated system with a
fail-safe design based on duplicated processors and open architecture in accordance
with existing industrial standards. It also has to have graceful degradation modes. The
data processing functional block should be able to support acquisition and processing
of data from several radars (i.e. multiradar tracking), automatic collection and
processing of flight plans, automatic allocation of SSR codes, coordination achieved
through direct connection to adjacent centres (e.g. on-line data exchange - OLDI),
coordination of civil and military flights via a separate military suite, and automatic flight
progress monitoring (continuous calculation of flight profile and update based on radar
data). The air situational picture should be presented on the HMI (radar and auxiliary
display) with necessary alert facilities (e.g. STCA, MSAW, CLAM, RAM). The playback
function of radar pictures should be available for incident investigation, testing,
development, and training.
The ATC Centre should have the capability to have paper strip presentation on the strip
console. A flight progress strip is a single strip of paper that contains all information on
a flight and its evolution through a particular sector of airspace. It is used as a quick
33
Chapter 2
way to record the progress of the flight and to keep a legal record of the instructions
issued. It is also used to allow the planning controller to predict future conflicts and to
ensure that sector entry/exit conditions are achieved. In addition, in the case of radar
failure, flight progress strips represent the primary control tool. The strip, mounted in a
strip holder, is placed with other strips in a 'strip board' which displays all flights in a
particular sector of airspace or on an airport.
In recent years, there have been initiatives aimed at electronic strip presentation, used
in many European ATC Centres and airports. However, as Lanzi and Marti (2001) point
out, controllers do not generally find electronic strips to have the same level of flexibility
and support as paper strips. On the other hand, more radical attempts have been made
toward a stripless environment, where aircraft information is tagged to the label on the
radar screen that can be expanded as necessary. In this environment generally three
modes of the same aircraft label exist: the standard label that is always displayed on
the screen, the highlighted label that is bigger and contains more information, and the
extended label that contains all information not immediately required by the controller
(for details see Lanzi and Marti, 2001).
The previous sections have discussed the current technologies relevant to an ATC
Centre. This forms a part of the definition of a generic ATC Centre. In addition, the
generic ATC Centre should be adaptable to changes in technologies. Hence, the
following section addresses the future of ATC and how this is likely to impact on an
ATC Centre.
2.5 The future of Air Traffic Control

The research presented in this thesis has to take into account the future challenges
that may face controllers with the increased exposure to more automated systems. In
this regard, this section briefly discusses the key challenges of automation,
characteristics of human-centred design, as well as the concept behind the ICAOs
Future of Air Navigation Service (FANS). The section concludes with a discussion of
the potential sources of technical and human performance deficiencies within the future
ATC Centres and their relevance to the equipment failures and the recovery process as
investigated in this thesis.
2.5.1 Challenges of automation

There are various definitions of automation, residing in different contexts. In the context
of Air Traffic Management (ATM), the National Research Council Panel on Human
34
Chapter 2
Factors in Air Traffic Control Automation (Wickens et al., 1998) defined automation as:
a device or system that accomplishes (partially or fully) a function that was previously
carried out (partially or fully) by a human operator.
According to Wickens (1992) automation is mainly applied to perform or assist
functions in which humans are naturally limited (e.g. accessibility to toxic, dangerous,
unreachable environments; or inherent working memory limitation). In addition,
automation is used to replace humans in operations which are time consuming, costly,
or induce high workload (e.g. complex monitoring or analytical processes). While often
seen as replacing humans, in reality, automation changes the role of the human
operator from direct manual control to largely supervisory control. In other words, in this
new role, the human operator plans and inputs tasks and the computer systems
implement these tasks automatically. Automation does not totally replace human
activity, it just changes the nature of the work that humans do. This change is often
completely unintended or unexpected by automation designers (Parasuraman and
Riley, 1997).
Past research has identified three sources of human performance deficiencies when
using high level automation (Bainbridge, 1983; Wickens et al., 1998; Wiener and Curry,
1980; Boehm-Davis et al, 1983). Firstly, humans become less likely to detect failures in
the automation itself or in the automated process. Secondly, they lose some
awareness of the state of the automated process. Finally, human operators eventually
lose skills in performing the actions manually if these actions have been previously
automated. These three phenomena are commonly known in literature as out of the
loop performance problems. This problem of deterioration of manual skills is
particularly relevant to controllers and flight crews. As Bainbridge (1983) points out, an
irony is that the more reliable the automation, the more prone to out of the loop
performance problems will be the operator. This is the direct result of the increased
complacency, over trust in automation, and deterioration of manual skills of both
controllers and pilots.
Experiments have shown that operators abilities to recover from emergency
automation failure significantly improve with levels of automation that require human
involvement in the implementation of a task. Thus automation strategies that allow
operators to focus on current operations may contribute to improved situational
awareness and reduction in workload (Endsley, 1997). As a result, a new approach to
35
Chapter 2
automation evolved resulting in human-centred designs instead of technology- or

automation-centred designs.
2.5.2 Human-centred vs. technology-centred automation

Traditionally, automation was perceived in an all-or-none fashion. At one extreme,
automation was employed completely and expected to eliminate human error. At the
other extreme, automation was kept to an absolute minimum, keeping the operator as
much as possible in the control loop. This traditional approach to automation has been
known as static, where the level of automated assistance was unchanged over time
(Parasuraman et al., 1990). However, decades of research showed that between these
two extremes, different levels of automation can be specified by the degree to which a
task is automated. This way of thinking led to a concept of human-centred automation
which is essentially developed around the idea to keep the operator in control of the
situation (Billings, 1996; Parasuraman et al.; 1990; Sheridan, 1980). As Layton, et al.
(1994) note, the design of any automated system should be seen as the design of a
new collaboration between the machine and the human operator.
According to Wickens et al. (1998) the choice of what to automate should be simply
guided by the need to compensate for human vulnerabilities and to exploit human
strengths. However, this simplistic approach may again lead to static automation, not
exploiting and adapting automation to the characteristics of the context (surrounding
the human operator). Therefore, it seems more reasonable to move beyond traditional
automation approaches toward the principles of dynamic allocation of control between
human and machine, i.e. adaptive automation (Scerbo, 2005; Kaber, 1997; Kaber and
Riley, 1999; Parasuraman et al., 1996; Parasuraman et al., 2000; Kaber, Prinzel,
Wright, and Clamann, 2002).
In short, the presence of automation is inevitable in all future concepts of air navigation.
Current design initiatives are more focused on the human-centred automation while
initial steps have began to be taken toward adaptive automation. For example, the
concept of cognitively convenient alarm onset has been tested on a US naval ship as
described in Daniels, et al. (2002). Based on the previous discussion on the main
principles of automation, it is necessary to review how these principles are
implemented in the design of future ATC systems and tools. The following section
presents the key concepts that will signify the characteristics of the Communication
Navigation and Surveillance (CNS/ATM) up to the year 2020.
36
Chapter 2
2.5.3 The future of air navigation service

The problems with the current air traffic management system can be summarised in
two areas. Firstly, the fragmentation of national systems prevents optimal use of global
airspace, as aircraft have to be controlled by many different air traffic systems.
Secondly, inherent limitations of current Air Traffic Control (ATC) technologies and
operational procedures are well known and make it impossible to achieve enhanced
efficiency and required capacity for the future (Ochieng, 2006).
To respond to the identified areas of concern, the International Civil Aviation
Organisation (ICAO) developed the Future Navigation Systems (FANS) concept built
around Communications, Navigation, and Surveillance in Air Traffic Management
(CNS/ATM) system. As a result, future concepts and strategies in ATM/ATC will follow
a global approach to ATM and no longer focus solely on national needs. In this overall
environment, ATM/ATC technologies will face necessary changes and development
currently under conceptual or design phase. The general drivers of future ATM/ATC
are structured around communication, navigation, and surveillance functionalities and
are summarised below:
communication in the 2020 time frame is expected to be characterised by a mix
of analogue voice and digital communication with increased use of datalink (VHF
based datalink-VDL, SSR Mode S datalink) and satellite communication
(SATCOM) to complement or replace existing analogue voice communications.
navigation in the 2020 time frame is expected to be characterised by a mix of
ground- and satellite-based systems with increased use of satellite systems (e.g.
GPS, Galileo) for all phases of flight.
surveillance in the 2020 time frame is expected to be characterised by a mix of
airborne (ADS, ADS-B, ADS-C, A-SMGCS, cockpit situational awareness-SAS)
and ground-based functions (SSR Mode S) with increased functionality
complementing or replacing the existing ground-based systems (PSR and SSR).
This succinct statement of the evolution of CNS/ATM within 2020 time frame needs to
be further discussed from the perspective of a generic ATC Centre. In other words, it is
necessary to discuss the potential characteristics of the generic ATC Centre in 2020.
Based on ICAO and EUROCONTROL future concepts, the following changes are
expected in the generic ATC Centre in 2020:
in support to Gate to Gate (G2G) flight management the following ATC systems
and tools are proposed for the period from 2010 onwards: four dimensional flight
37
Chapter 2
trajectory prediction, sequencing managers (AMAN, DMAN), MTCD, monitoring

aid (MONA), system supported coordination (SYSCO);
stripless environment;
datalink communication;
autonomous or free flight concept less reliant on ground-based navigational aids;
transfer of separation responsibility to the flight deck giving controllers more of
a monitoring role;
electronic (silent) coordination; and
dynamic optimisation of airspace through the Single European Sky (SES)
initiative (EUROCONTROL, 2007b) and the concept of flexible use of airspace
(see MANTAS concept; EUROCONTROL, 2004b).
After presenting the system design principles and characteristics of future ATM/ATC, it
is important to discuss the impact that those changes may have on equipment and
human reliability. Following the main objective of the research presented in this thesis,
it is necessary to identify the potential sources of technical and human performance
deficiencies and their relevance to the controller recovery process.
2.5.4 Impact of future ATM/ATC on controller recovery from equipment

failures
With the accumulated knowledge of the modern integrated ATC systems, it is
reasonable to assume that future overall equipment reliability will remain similar to
current standards. However, the nature and types of equipment failure may change.
While eliminating single-points failure, future ATC Centres may experience increased
problems with software reliability and data integrity (e.g. presentation of inaccurate
data). This will be the direct result of a more complex and integrated ATC architecture
as well as incompatibility between current and future, more automated ATC equipment.
In other words, the future ATC Centres may be faced with failure types that will be
harder to detect and repair. The highly integrated ATC architecture may mask some of
these failures and hide the real cause(s) of the problem.
When discussing human reliability issues in future ATM/ATC environment, it is
reasonable to assume that automation design will create situations where controllers
will not be able to cope with its complexity or simply will not have enough time
available. This is a direct result of the assumed out of the loop performance and the
reduced separation between aircraft (as a requirement to provide necessary capacity).
As noted by Wickens et al. (1998), the time available to safely respond to an
38
Chapter 2
emergency situation will decrease with decreased separation, while the operator
response time may increase due to out of the loop performance. One alleviating factor
may be the transfer of responsibility for separation management from controllers to
pilots, giving the former more time to affect recovery. The environment of collaborative
decision-making and real-time information exchange though threatens to distribute
false or inaccurate information from the ground to the air. In this case, ATC equipment
failure may affect the airborne segment of ATM and cockpit instruments (e.g. Flight
Management System - FMS).
The European Organisation for Safety of Air Navigation (EUROCONTROL) recognised
that the role and nature of controller tasks will change as a result of the addition of
increased automation within the ATM system. As a result, they initiated the Solutions
for Human-Automation Partnerships in European ATM (SHAPE) project to better
understand interactions between automated support and controllers (EUROCONTROL,
2004f). SHAPE has identified seven factors that need to be addressed to ensure
harmonisation between automated support and the controller. Amongst factors such as
trust, situational awareness, team issues, skills, ageing, and workload, SHAPE
recognised the importance of managing system disturbances (details are presented in
Chapters 5 and 7). As a result, the assessment of controller recovery presented in the
remainder of this thesis, considers the interactions between human and automation. A
flexible approach has been developed to assess controller recovery in any possible
context.
In short, the role of the human operator will remain significant in the future ATC
environment. Due to the transfer of responsibility for separation management from
controllers to pilots the recovery performance will evolve from purely controllers
actions to collaboration between controller and pilot. To support human performance in
the future more automated environment (both on the ground and in the cockpit), special
attention will have to be given to the areas of human-computer interaction, training, and
procedures for both normal and abnormal situations.
2.6 Summary
The aim of this Chapter is to create a basis for the research on recovery from
equipment failures in ATC. There are several findings that will be taken forward from
this Chapter. Firstly, this Chapter defined ATM and its component ATC and thus
indicated the scope of the research presented in this thesis. Secondly, this Chapter
placed additional emphasis on the ATC functional classification. This classifications
39
Chapter 2
starts with the main ATC functional blocks further dissected to element level. It has
been defined based on both current and future ATC systems and tools in accordance
with principles and initiatives of ICAO and EUROCONTROL. As such, this ATC
functional breakdown is flexible to changes in ATM/ATC and should capture both
current and future equipment failure types. Finally, this Chapter defined characteristics
of a generic ATC Centre in both current and future ATC environment. This finding
creates a base for the entire research presented in this thesis.
The next Chapter focuses more on the equipment component of the ATC system.
Since the aim of the overall thesis is to assess the impact of equipment failures, the
next Chapters provide relevant definitions, identify types of equipment failure, and their
contribution to the safety of the overall air transport system. A sample of operational
failure reports used in this research is validated through a framework based on the
contribution of equipment failures to the overall safety of air transport system.
40
Chapter 3
Preliminary Assessment
3
Preliminary Assessment of Equipment Failures in
Air Traffic Control
The previous Chapter presented the context of the research in this thesis by describing
the Air Traffic Management (ATM) system and its component the Air Traffic Control
(ATC) system. Furthermore, it detailed the range of functions provided in an ATC
Centre. The main characteristics of current ATC Centres as well as the concepts
shaping their future characteristics were covered also. A comprehensive analysis of
equipment failure should follow its life by assessing all the phases that this occurrence
undergoes throughout the ATC system (Figure 3-1). An equipment failure firstly
encounters the existing technical built-in defences. If these inherent defences are
insufficient to prevent the failure impacting on the ATC system, the failure now
becomes a hazard. Hazards represent a sub-group of equipment failures that penetrate
existing technical built-in defences and hence require human intervention (or human
recovery). An equipment failure occurrence concludes with the outcome which is the
result of the collaboration between technical and human recovery.
Figure 0-1 Phases of an equipment failure occurrence
Following the equipment failure life, the Chapter starts with the relevant definitions of
equipment failures and hazards. While the human recovery and outcome phases of the
equipment failure life are discussed in the remainder of the thesis, this Chapter
continues by presenting the available sample of operational failure reports. It also
discusses the reporting schemes used to obtain equipment failure reports and data
pre-processing issues. The appropriateness of this sample is assessed by using a
41
Chapter 3
methodology that determines how much ATC equipment contributes to the safety of the
overall air transport system. Agreement between the findings obtained from past
research and the analysis of available operational failure reports indicates the validity
of this sample. Once this is achieved, the thesis continues with more in depth
assessment of the available sample in the following Chapter.
3.1 Definition of equipment failure

The focus of aviation safety and reliability management has mainly been on the
prevention of technical failures, human failures (also known as human errors), and
more recently organisational or management failures (Reason, 1997). The European
Organisation for Safety of Air Navigation (EUROCONTROL) defines failures in the ATC
system as the inability of any element of that system to perform its intended function or
to perform it correctly within specified limits (EUROCONTROL, 2002c). As discussed
in Chapter 2, the ATC system comprises of people, equipment, and procedures
integrated in an optimal way to achieve a common objective. However, the research
presented in this thesis focuses solely on failures of one component of ATC system,
namely equipment. Therefore, in the following text, the term failure will only apply to
equipment failures or malfunctions.
Leveson (1995) defines failure as the inability of the system or component to perform
its intended function for a specified time under specified environmental conditions. The
definitions by Leveson and EUROCONTROL are similar as both take into account
failure in a much wider sense. In this research a failure occurs when any component of
ATC equipment terminates unexpectedly and no longer performs the required function,
while the overall ATC system remains operational. If the entire ATC system becomes
unavailable, the failure is known as an outage. For example, communication failure is
observable in an ATC Centre if there is unexpected failure of radio communication
equipment on one console. However, if the failure affects the entire ATC Centre (e.g.
due to loss of power), this failure is known as an outage. It is important not to restrict
the term failure only to catastrophic events. Small-scale failures can combine to act
more severely in different environmental conditions (contexts). According to Wickens et
al. (1998) the source of such problems could be software bugs, erroneous or delayed
data exchange, or design deficiencies. Figure 3-2 illustrates the definitions discussed
previously.
42
Chapter 3
Air Traffic Control

(ATC) System
PEOPLE
EQUIPMENT
PROCEDURES
& TRAINING
FAILURE
HUMAN FAILURE =
HUMAN ERROR
EQUIPMENT
FAILURE
Equipment
failure
FAILURE OF
PROCEDURE AND/
OR TRAINING
Local impact: console/sector
Failure mode
Failure effect observable on

equipment and/or ATC system
Overall impact: entire ATC
Centre
Outage or
Fallback
Figure 0-2 Different definitions
In a similar way, it is necessary to differentiate between total and partial equipment

failures. Using the example above, a total radio communications failure will result in a
situation where a controller working position (or a sector) can no longer provide air
traffic services due to the inability to communicate clearances or instructions to aircraft.
However, if a failure affects only one element, either the transmitter or receiver, and the
other component is still operational on that position (or the sector), the radio
communication failure will be regarded as partial. In other words, if the equipment no
longer performs any aspect of the required function the failure is total, but if at least
some portion of the required functionality still exists, the failure is only partial.
All technical items are designed to fulfil one or more functions. A failure mode is thus
defined as an inability to partially or completely fulfil one of these functions (Figure 3-2).
It is also defined as the visible effect of a failure on the ATC system. Note that
equipment failures may not have any visible impact on the ATC service due to the
availability and effectiveness of built-in defences (e.g. redundancy) discussed in more
detail in Chapter 4. In this case, the only visible effect on the system (i.e. failure mode)
would be the engagement of the first level of redundancy. In some cases, this transition
is done seamlessly and it is only apparent to technical staff, but not to controllers. The
43
Chapter 3
UK national air navigation service provider (NATS) differentiates between fallback and
failure modes. According to NATS, fallback mode is a condition which occurs only if
there is a major failure or when the level of redundancy is significantly eroded (NATS,
2002). Thus, the NATS definition of fallback modes corresponds closely to outages
defined previously.
It is very important to distinguish between equipment failures and human operator
failures, known as human errors (Figure 3-2). Note that it could be said that all failures
are human in their nature, since most of them involve humans at some stage of the
process, e.g. system designers might fail to anticipate a certain equipment state.
Humans are also involved in manufacturing, testing, validation, certification, and
maintenance. Any of these human operators can be directly or indirectly responsible for
a failure occurring in ATC. It is also important to note that non-technical failures should
not be directly considered as human failures. Frequently, a failure that has no obvious
technical cause is directly attributed to the human, due to a lack of a deep and
objective analysis of its causes and dynamic relations between technical and human
components of the system (Straeter, 2001).
The following sections start with the definition of a hazard, as a sub-group of equipment
failures that penetrate existing technical built-in defences and hence require human
intervention, which is the focus of the research presented in this thesis. This is followed
by the presentation of the sample of operational failure reports available in this thesis.
3.2 Definition of a hazard

The research in this thesis focuses on failures that penetrate technical defences (i.e.
technical recovery) and therefore impact (with different levels of severity) on a
controllers performance. In this thesis, a hazard is defined as the ATC system state
resulting from an equipment failure that penetrates all existing technical defences and
affects the ability of the controller to perform his/her tasks. In different contexts a
hazard may have different definitions. For example, EUROCONTROL (2002c) defines
a hazard as any condition, event or circumstance, which could induce an accident or
incident. This EUROCONTROL definition is too broad and thus not in line with the
scope of this research. Thus, the term hazard in this research takes into account only
failures that require controller intervention (i.e. human recovery. The failures that
belong to this category are addressed in this thesis.
44
Chapter 3
The following examples may help to clarify the difference between failure, hazard,
technical and human recovery, as defined in this research:
A blocked radio frequency (failure) prevents exchange of information between a

controller and pilot. This failure presents a hazardous situation and requires the
controllers immediate action (human recovery). Changing the frequency on the
same working position or moving to another available working position are
possible ways to recover.
A power loss (failure) affects one set of Controller Working Positions (CWP).
Due to the independent Uninterruptible Power Supplies (UPS) electrical energy
is continuously provided and the controller does not notice this failure (no
hazard). The automatic changeover to UPS represents one example of built-in
technical defence or technical recovery (see Chapter 4 for detailed
explanation). If the continuous supply of electrical energy is not provided,
several CWPs may experience a problem, creating a hazardous situation and
requiring controller intervention (human recovery).
It should be pointed out that although this research considers only failures which lead
to hazardous situations, there are other failures as well. These other failures represent
the majority which never affect the controllers performance due to the effectiveness of
technical built-in defences (NATS, 2002). However, these failures still require
intervention, repair, and maintenance by engineers from the ATC system control and
monitoring unit.
After defining a failure and hazard as used in this research, the next session analyses
the nature of equipment failures in the operational environment. Details on this sample
of equipment failure reports are presented in the following section.
3.3 Supporting data: operational failure reports

Operational experience in this research is captured through a sample of operational
failure reports. They originate from four de-identified countries, referred to as Country
A, B, C, and D due to confidentiality. The following discussion focuses firstly on the
process of reporting equipment failures and their collection at the local level (i.e.
database of the ATC Centre) and national level (database of the respective Civil
Aviation Authority-CAA). The discussion continues by revealing a range of data preprocessing problems and the corresponding solutions.
45
Chapter 3
3.3.1 Reporting and data collection

The aim of occurrence data collection is generally to record the safety performance of
the relevant unit (e.g. ATC Centre). The data are collected on a range of safetyrelevant occurrences, such as incidents, losses of separation, equipment failures, bird
strikes, runway incursions, level busts, and others. For example, at the European level,
the EUROCONTROL ESSAR 2 document (EUROCONTROL, 2000c) provides
recommendations on the reporting and assessment of safety occurrences in ATM. As a
result, the national Civil Aviation Authorities (CAAs) specify the types of ATM
occurrences to be collected, analysed, or investigated through their mandatory
occurrence reporting (MOR) schemes (Figure 3-3). For example, the UK CAA also
specifies who can report an occurrence, what the correct reporting procedure is, and
how the details should be disseminated (in the case of the investigation). The UK CAA
states that the objective of this reporting scheme is to contribute to the improvement of
air safety by ensuring that relevant information on safety is reported, collected, stored,
protected, and disseminated. The sole objective of occurrence reporting is the
prevention of accidents and incidents and not to attribute blame or liability (UK CAA,
2005).
Figure 0-3 Reporting system
In aviation generally, as in ATC, data is usually stored and sorted electronically in

different databases. Collection of data in hardcopy has long been abandoned in most
of the developed countries worldwide. The type and level of database detail depends
on the unit/group/authority collecting the data (e.g. a system control and monitoring
unit, air navigation service provider, or national CAA). For example, when collecting
equipment failure occurrences, the most detailed information is available in the
46
Chapter 3
database of the control and monitoring unit within the particular ATC Centre. This
database must contain information on all equipment failures that occurred in the ATC
Centre regardless of their impact or severity. The reason for this is because
engineering staff have to have a complete insight on all equipment failures as they are
responsible for repair and maintenance.
However, not all equipment failures are required to be reported at a national level. The
choice of those that need to reach respective CAAs is made through a review of
reported incidents or safety events on a monthly, quarterly, and annual basis. As a
result, a national database will contain only occurrences of appropriate severity
characteristics and impact on operations. As an example, the UK CAA uses a MOR
database which contains, amongst others, reports on equipment failures that impact on
the controllers ability to provide air traffic services. These reports are fed in from the
Engineering Reporting Occurrence Database which contains details on all technical
problems, failures, and maintenance issues, of which the majority pass unnoticed by
controllers (due to the high level of ATC systems redundancy).
Collected data is regularly analysed to assess the safety performance at national level
as well as at the level of the relevant units (e.g. ATC Centre). Furthermore, this
information is sometimes used on a wider basis for benchmarking studies and to record
the safety performance of a given region (e.g. European Civil Aviation Conference
ECAC consisting of 41 European countries).
3.3.2 Data pre-processing problems

As previously mentioned, the research presented in this thesis uses operational failure
reports from four operational databases. Problems experienced with extracting failures
from different operational databases can be summarised as follows:
Different reporting schemes produce different levels of reporting detail. The amount
and quality of information reported differ significantly from one report to another.
Therefore, inconsistencies between reports were identified in terms of failure impact
(i.e. severity), duration, and location.
There are differences in terminology used (e.g. Computerised Automatic Terminal
Information Service - CATIS as Automatic Terminal Information Service - ATIS by
another name, hotline as ground to ground communication, usually intercom;
National Aeronautical Information Processing System - NAIPS as Aeronautical
Information Service - AIS), usage of very specific component names (e.g. Air
Ground Data Processor - AGDP, as part of datalink system).
47
Chapter 3
A lack of reporting culture that results in uncertainty related to data reliability and
completeness.
These problems are addressed below highlighting the approaches adopted to mitigate
them.
All reports have a short, one sentence long, summary followed by a description of the
equipment failure incident plus some additional information (e.g., date, occurrence
number, location, area code: flight information region or sector name). Unfortunately
the additional information were not always available. Additionally, Countries C and D
provided their internal severity categorisation, while Country D provided information on
failure duration. Since Country Ds dataset originates from an engineering unit, the
duration variable was measured from the first log of the failure until its final resolution.
As a result, it was possible to consistently extract four types of information. The type of
equipment/ATC functionality affected and complexity of failure type are extracted
usually from the short summary available for each report. The severity of equipment
failure is extracted using the available severity rating (if it existed) or assessing the
available information of the operational and safety impact of equipment failure and thus
applying the severity rating derived in this research (see Chapter 4, Table 4-5). Finally,
the duration variable is available only in the Country D database.
Data pre-processing is based on the classification of ATC system functionalities (see
Chapter 2). In certain reports it was very difficult to determine the type of equipment.
This problem was compounded by having only an acronym to explain precisely what
the report referred to. Consequently, several interviews have been conducted with
engineering staff from two European ATC Centres to correctly identify and classify
those ambiguous problems and assure proper classification. A glossary of terms and
acronyms is found to be a very useful tool during the pre-processing stage. Such
documents should accompany (or be an integral part of) every database as part of a
normal reporting practice.
Within one country, the number of reports may not reflect the actual number of
equipment failure incidents in the ATC Centres for a variety of reasons. The main
reasons may be the lack of reporting as a result of an inadequate reporting culture in
the ATC Centre and aviation community overall. Secondly, not all equipment failures
are included in the CAA databases. As previously explained, only failures of certain
48
Chapter 3
severity (i.e. impact on ATC operations and controller performance) tend to be reported
to the CAA. As a result, the available operational failure reports are neither necessarily
complete nor reliable (i.e. they lack the detail on the context surrounding a reported
occurrence). To date, no measure of completeness and reliability of occurrence
databases has been produced. This is a task for future research.
3.3.3 Available operational failure reports

As stated previously, there are four sources of data on equipment failures included in
this thesis, Countries A, B, C, and D. The first three data sets are from Civil Aviation
Authority (CAA) databases for a given time period. In other words, these are equipment
failures reported in the CAA database for all ATC Centres within the national
boundaries of these countries over a given time period (usually a year). The fourth data
source (Country D) represents data from the system control and monitoring unit of one
ATC Centre. Table 3-1 gives a summary of the available data.
Table 0-1 Summary of available data, number of reports, and equipment failure incidents per
country
Average flight
hours flown for
available time
period
Total number
of reports preprocessed
Total number of
equipment
failures
reported
Country
Source of data
Time period
available
CAA
1999-2003
1,375,800.00
1,378
791
CAA
2001-2005
1,027,870.00
1,393
1,324
CAA
1992-2004
389,245.68
3,340
448
System control
unit/ATC Centre
08/2000-2004
428,502.22
16,697
7,788
22,808
10,351
Total
After pre-processing of all available equipment failure reports (22,808), more than ten
thousand reports (i.e. 10,351) are identified as equipment failures in air traffic control
(Table 3-1). The remaining reports mainly comprised of equipment related reports
outside of the national airspace, multiple reports filed for the same occurrence to reflect
multiple finding or causes identified, as well as reports on non-ATC equipment and
other non-technical types of incidents (e.g. human error, runway closures due to nonequipment issues, scheduled maintenance, software updates, and scheduled hardware
changes).
49
Chapter 3
The time period studied, for countries A and B, could be considered steady (uniform)
with respect to the ATC service provided and other aviation related factors (e.g. traffic
levels, jet fuel prices, airline fares, regulations). However, one modern ATC Centre was
opened in Country A in the second half 2001. This resulted in a relatively large number
of early failures of individual components early in 2002. This is a recognised
characteristic of the initial life or burn-in period of any newly implemented system
(Figure 3-4).
Figure 0-4 Bathtub model of reliability for electronic components (Leveson, 1995)
Country B underwent a complete modernisation of its ATM system in 2000. Given that
a typical burn-in period range between 30-90 days (IEEE, 1998), it is reasonable to
assume that the system was well integrated and settled for the period of the data (i.e.
2001 to 2005). Therefore, the average number of incidents reported in this period could
be considered representative and appropriate for further analysis.
However, the time period available for Country C consists of 13 consecutive years (i.e.
1992 to 2004). This country went through extensive regulatory changes throughout the
1980s. The change in air service licensing assured that any operator that could prove
financial viability and meet safety standards would obtain a license. As a result, by the
end of the 1980s, the number of operators had more than doubled. At about the same
time, the Government decided to commercialise most of its service provision activities.
Thus air traffic and other services formed new state-owned commercial enterprises.
However, all of these changes were firmly embedded into the system until the 1990s,
and therefore, the sample provided could be considered stable and appropriate for
further analysis.
Country D is unique in that it provided data from a single engineering unit database and
therefore represents the most detailed data source in this research. It covers the
50
Chapter 3
shortest period available (3.5 years) but contains the highest proportion of failures or
75 percent of all available reports.
Although the available sample has a significant number of operational failure reports,
this still does not indicate how representative these reports are of the operational ATC
environment. For this reason, a methodology for the top down total aviation system
safety is developed. This methodology enables determination of the contribution of
ATC equipment to the safety of the overall air transport system based on past
research. Once this is established, the same methodology is applied using the
operational failure reports and then the results are compared. This methodology and
the subsequent validation of the available operational data are presented in the
following section.
3.4 Methodology to assess the relevance of supporting data

This section develops the methodology for an assessment of the available sample of
operational failure reports. In order to assure the relevance of this sample, this section
builds a methodology for its validation. In short, the contribution or risk budget of
equipment failures to the overall safety of air transport system extracted from past
literature is compared to the result obtained from the analysis of available operational
failure reports. The section starts by identifying the overall aviation Target Level of
Safety (TLS) and derives risk budgets for ATM and its ATC component. It concludes by
determining the risk budget of ATC equipment. In other words, this methodology
determines the contribution of ATC equipment failures to the safety of the overall air
transport system. This finding is then compared to the results of the preliminary
analysis of the available operational failure reports.
3.4.1 The accident to incident ratio

Aviation Target Level of Safety (TLS) expressed only in terms of accidents has two
potential limitations. Firstly, the number of accidents is small for any adequate
statistical analysis. Non-accident data, such as loss of standard separation between
aircraft in controlled airspace, is therefore necessary to establish the occurrence of any
trends. Secondly, the number of accidents (or accident rate) is not necessarily the best
measure of safety performance. For example, the currently used target of one accident
in 107 flight hours demands the collection of operational data over many years to
demonstrate whether the TLS has been met. A single accident may violate the TLS,
whilst many years without an accident will satisfy the TLS, but conceal any
deterioration in safety prior to an accident (Graham, Kinnersly, and Joyce, 2002). In
51
Chapter 3
this context, past safety analyses (not only in aviation) have used the number of
incidents together with the assumed accident/incident ratio. The United States Federal
Aviation Administration (FAA, 2000) cites several different analytical approaches. The
two most common of these are discussed below.
In the 1940s, Heinrich introduced the idea of the existence of accidents where injuries
did not occur, but considered only damage to property (Heinrich, 1941). This led to the
creation of the so-called Heinrich pyramid with established proportions of accidents,
serious incidents, and incidents; 1:29:300 (Saldana et al., 2002). After these initial
studies, there was stagnation in the theoretical underpinnings of safety investigations
until the practical work of Byrd in the 1970s. Byrd carried out his work in a steel factory
and revised Heinrichs proportions to 1:29:600 (Saldana et al., 2002).
However, whilst both of these studies are valuable in their statistical analyses, they do
not seem to be appropriate in dealing with equipment failures in ATC, at least not in the
ratios they offer. Both studies are designed to determine the risk and related ratio of
on-the-job accidents and incident. The reason for the weaknesses in both studies may
originate from their design and in particular, the bias of analysing accident reports filed
by supervisors only (which tend to blame injuries on workers) and much lower levels of
equipment reliability and integrity compared to the systems used in ATC today.
For the purpose of the research presented in this thesis, additional attention has been
given to the ratio between accident and incidents induced by ATC equipment failures.
However, a EUROCONTROL safety assessment study assumed that one in 10,000
equipment failures will contribute to an aviation accident (EUROCONTROL, 2004c), an
assumption which is in line with the high reliability requirement for the overall ATC
systems, as well as ATC equipment. A number of arguments can be made to suggest
that in future, this proposed ratio will decrease:
The number of incidents should decrease due to continuous safety initiatives and
hazard prevention programmes;
The probability of an incident leading to an accident should decrease due to
increases both in equipment reliability and advanced solutions for redundancy
and diversity (dissimilar redundancy);
Changes should be seen in the type of incidents occurring, in that as a result of
enhanced risk management approaches, the frequency of serious incidents
should reduce;
52
Chapter 3
There should also be a decrease in the number of software-related incidents,

which are prevalent today as discussed earlier. Hardware-related incidents
should also diminish.
The arguments discussed above infer the step change in software and hardware
reliability as a result of considerable operational experience, knowledge, and expertise.
For example, in its requirements for the software configuration EUROCONTROL states
that reporting, tracking, and corrective actions are set in place to mitigate any softwarerelated problem (EUROCONTROL, 2003i). Note also that a decrease in the number of
incidents should only consider the steady state (i.e. useful life) as captured in the bath
tub reliability model (Figure 3-4).
It has been highlighted that perception of risk only in terms of accidents tends to mask
the actual safety issues. For this reason, it is important to include the number of
incidents so as to estimate the appropriate accident/incident ratio. After the discussion
of accidents and incident ratio, the following section discusses the units of
measurement used in aviation and thus the different perspectives obtained in the
investigation of a critical event.
3.4.2 Units of measurement

The rate of any critical event represents the number of occurrences (e.g. equipment
failures, incidents, accidents) divided by the exposure to those events. For example,
aviation accident statistics are presented in a variety of ratios and units, called units of
measurement. The most frequently used are the number of accidents per operation
(take off or landing), per million flight hours flown, per flight, per million departures, per
million aircraft-miles, per million aircraft-hours, per million passenger-hours, and per
million passenger-miles.
No single measurement gives a complete picture of the critical event under
investigation. Each of these units gives only one perspective, whilst possibly hiding
others. For example, rates per million passenger-miles are most useful for comparing
air transport and other modes of transport, whilst aircraft departures are suitable for
comparison of accidents between small commuter jets and large commercial jets (e.g.
BA46 and B747, respectively). In addition, for the determination of the required
performance of the landing aids e.g. Instrument Landing System (ILS) or Microwave
Landing System (MLS), the only appropriate measure would be the number of landings
53
Chapter 3
per time period of interest. Any other measure would mask the true performance
values.
In addition to the units of measure, accident rates are determined by the definition of
the critical event as well. These critical events range from accidents, fatal accidents,
hull losses, to the number of fatalities or injuries. An accident, as defined by ICAO
Annex 13 (ICAO, 2001d), involves an occurrence associated with the operation of an
aircraft, which takes place between the time that any persons board the aircraft with the
intention of flight and that all such persons have disembarked, in which any person
suffers death or serious injury, or in which the aircraft receives substantial damage.
This definition therefore comprises fatal accidents as well as hull losses. Thus, in
dealing with various accidents rates it is crucial to be aware of the precise definition of
both the critical event and the unit of measurement used.
The current rate of aircraft accidents per million flying hours has remained constant
over recent years. If the same accident rate is assumed for the future together with
predicted increases in traffic levels, there will be an increase in the absolute number of
accidents. Using the current accident rate, ICAO has predicted that by the year 2010
there will be an aircraft accident per week, i.e. 52 accidents per year (Hai, 2004). This
is the reason why the US FAA and other aviation authorities have identified the need to
significantly decrease the risk of aircraft accidents.
The following sections propose a methodology for the derivation of aviation target level
of safety (TLS) based on the rate of aircraft accidents (defined as a number of
accidents per flight hour). An accident is defined according to ICAO, while the flight
hour has been chosen as the most appropriate measure of risk induced by equipment
failures. It is usually more convenient to work in terms of flight hours rather than
operational hours of an ATC unit or sector. This approach avoids difficulties and
differences associated with the geographical coverage of the system(s) being
considered, phase of flight, the density and complexity of airspace, as well as available
systems and equipment (e.g. number of radars, navigation systems, communication
systems). This is also in line with Required Communication, Navigation, and
Surveillance Concepts (RNC, RNC, RSC) as defined in the previous Chapter. In short
the proposed methodology starts by identifying the high-level aviation target level of
safety further focusing on the precise contribution of equipment failures, as the type of
occurrence under investigation in this thesis.
54
Chapter 3
3.4.3 The acceptable risk or target level of safety (TLS)

The methodology to determine the contribution of equipment failures to the safety of
the overall air transport system is organised in several steps. Firstly, existing aviation
standards for Target Level of Safety (TLS) are assessed. Secondly, the contribution of
ATC to the risk of an aircraft accident is determined. Thirdly, the contribution of ATC
equipment to the ATC risk budget is determined. These findings are than extrapolated
to the year 2020, as the target year in this research in line with the European
Commissions Vision 2020 (European Commission, 2001). The final step involves
validation of the available sample of operational data using the same methodology.
These steps are presented in the following sections.
3.4.3.1 Existing standards
Technology and engineering have brought numerous inventions and benefits to the
modern way of life. Whilst these benefits are welcome, the risks associated with them
are not. The high pressure on the engineering world to reduce risk and increase safety
comes at a financial price. Therefore, it is important to manage the trade-off between
risk and the cost of its reduction.
As a result, there are certain degrees of risk that must be accepted. Determining the
acceptable level of risk1 is generally the responsibility of management and is based on
several principles. These are the objective to be achieved, the alternatives available,
and the consequences and values that can be identified. Based upon this, the TLS is a
quantified level of risk (or potential loss) that a system should be designed to deliver
(Brooker, 2004). In aviation, the TLS is usually expressed as a number of aircraft
accidents per flight hour flown, which is used in this thesis, as indicated previously.
The concepts of TLS and risk budgeting are directly linked. Indeed, risk budgeting
represents a top-down distribution of TLS (or total aviation risk) between the
independent sub-categories. The logic behind this process is to specify the maximum
Note the difference between acceptable and tolerable risk. Tolerability refers to a willingness
to live with a risk so as to secure certain benefits and in the confidence that it is being properly
controlled. Tolerable risk, is not ignored, but is controlled and reduced further if possible. On the
other hand, acceptable risk means that we are prepared to take risk as it is (Reid, 1996). It
should be noted also that acceptable risk is a relative term and is based on different risk
perceptions: individual, public (group of individuals), industry (industry usually needs additional
pressure to declare a product as unsafe), and risk perception by safety experts. They all differ in
the level of risk they are willing to accept.
55
Chapter 3
acceptable risk for each sub-category, so that each one has to produce equal or lower
risk than prescribed (see Figures 2-1 and 2-3).
As pointed out by Brooker (2004), there are several methods to derive the TLS. In most
cases, the analysis starts from the current situation and uses an improvement factor to
derive the desired TLS. In some cases, this improvement factor may be established as
a continuing trend from the past translated into the future. It should incorporate traffic
growth factors, factors representing changes in the systems involved, the operational
procedures, and work practices. In other cases, it may be based on a common
agreement between technical experts, with the main idea underlying it being to set
challenging, but still realistic safety improvement targets.
The following sections provide an overview of the most relevant aviation TLS analyses.
The level of diversity between these approaches highlights the complexity of the
problem and the need for a consistent top-down total air transport system approach.
3.4.3.1.1 Joint Aviation Authority
The Joint Aviation Authority (JAA) document JAR-25.1309 is one of the main regulatory
documents in aviation. It also defines the fundamental principles that govern aircraft
design and certification. JAR 25.1309 defines the risk of a serious accident due to
operational and airframe-related causes to be in the order of one per million hours of
flight. About ten percent of the number of accidents related to operational and airframe
causes is attributed to aircraft equipment failures (e.g. hydraulics and electrical
systems) and the rest (90 percent) to other operational aspects (JAA, 1994). A
EUROCONTROL review of existing TLS standards and practices (EUROCONTROL,
2000a) argues that this requirement is based on data from the 1960s and as such is
outdated.
Furthermore,
the
JAR
requirement
is
related
to
aircraft
design,
encompassing only aircraft equipment, without consideration for the other components
of the air transport system (including ATM). Accordingly this JAR requirement needs to
be informed with all the major changes in the aviation industry since the 1960s. The
following paragraphs indicate several key factors that symbolise the changes and
growth in aviation since the 1960s.
There has been a rapid expansion in the air transport industry over the last four
decades due to a number of factors, including growth in the world economy,
advancement in flight technology and the deregulation of the airline services. The result
of these forces has been a steady decline in airline costs and passenger fares, which
56
Chapter 3
has further stimulated traffic growth. As an example of economic growth, ICAO cites
that there has been an increase in total gross domestic product (GDP) by a factor of
3.8 over the same period (ICAO, 1997). The GDP is considered to be the most
appropriate available measure of world output and indicates the health of the global
economy.
Changes in flight technology have also had a major effect on the growth in travel
demand. The modern era of air transportation began in the 1960s. The major drive was
the replacement of piston engines with jet engines, which was accompanied by
increased speed, reliability, and comfort. This change led to a reduction in operational
costs, which in turn led to increased travel demand.
In addition to this, changes in the regulatory environment in both the US and Europe
have had a big effect. The deregulation of airline services in the US in 1978 allowed
airlines to improve services, reduce average costs, increase routes, and increase
efficiency of scheduling. In Europe, the introduction of a single market for aviation
services by the European Union in 1992 has seen similar changes to that seen in the
USA.
The ICAO Manual on Air Traffic Forecasting (ICAO, 1985) suggests three methods for
forecasting future civil aviation traffic. These methods are trend projection, econometric
analysis, and market and industry survey. Econometric forecasting is the only method
that takes into account various economic, social, and operational factors affecting air
traffic. The objective here is to translate the relevant factors into projections of future
traffic growth. Then the traffic growth factors are reviewed further to incorporate
prospective changes by other factors that are not accommodated in the econometric
analysis.
The predicted traffic growth will influence target safety levels through the increase in
the number of flight hours forecast. However, there are other factors, not necessarily
included in this forecast of traffic growth, that have the potential to influence the level of
safety. Some of these factors are: the growth in the total number of aircraft flying as
well as in the passenger capacity of aircraft (e.g. Airbus 380, Airbus 350, Boeing 7E7
Dreamliner), increased airport and airspace congestion, technological development
(e.g. advanced safety nets, satellite-based CNS/ATM), and pressure on finding the
tools to control and mitigate human error. Another important factor not considered is
57
Chapter 3
the increasing effect of environmental policies on aviation, in particular on air fares,

costs, and restrictions to possible routes.
Therefore, in line with the EUROCONTROL argument the JAR requirement should be
informed with an analysis based on an updated data sample of accident rates from the
last four decades. At the same time, future predictions and regulations should be based
on econometric forecasting, which will involve the effect of traffic growth as well as
other economic, technical, and operational factors.
3.4.3.1.2 UK Civil Aviation Authority
The UK Civil Aviation Authority (CAA) has calculated a worldwide fatal accident rate
using the Worldwide Aircraft Accident Summary (WAAS) aviation database sample2 for
the period 1990-1999 (UK CAA, 2000). The CAA based its analysis on this sample and
the following assumptions (EUROCONTROL, 2005):
A fixed annual traffic growth rate until the year 2020 (i.e. 4 percent for western
built jets); and
A constant number of fatal accidents per year (i.e. eight fatal accidents each
year).
Based on these assumptions, the UK CAA predicted a rate of 1.8E-07 fatal accidents
per flight for the year 2020. For the purpose of the methodology presented in this
Chapter, this target has been translated into the rate per flight hour using the
information available on the Boeing web site (Boeing, 2004) as follows. The average
flight in 1982 was approximately 1.4 hours, while in 2002 it was 1.94 hours. If this trend
continues, it is determined in this research that the average flight in 2020 will be 2.43
hours. Using this assumption, the UK CAAs TLS for the year 2020 corresponds to
7.4E-08 fatal accidents per flight hour.
3.4.3.1.3 International Civil Aviation Organisation
There have been several attempts by ICAO to derive aviation target levels of safety.
These originate from a number of different studies and reports, which are presented
below, from the earliest to the most recent.
Information published by Flight International (monthly publication of Reed Business

Information Group). Includes accidents and serious incidents worldwide with the exception of
the Commonwealth of Independent States (CIS) before 1990 (former Soviet Union). The data
set covered only commercial aircraft or aircraft with maximum takeoff weight above 5.7t.
58
Chapter 3
ICAO North Atlantic Systems Planning Group (NATSPG) - the ICAO NATSPG
initially developed a method using the data on fatal accidents of jet aircraft in
the period from 1959 to 1966 (EUROCONTROL, 2000a). Based on available
data3 this analysis estimated fatal accident rate of 2.34E-06. The analysis
progressed by assigning a factor 0.1 for accidents due to collision. The basis for
this assumption is not evident or recorded. An improvement factor between two
and five was further applied to justify the use of historical data on future targets
(EUROCONTROL, 2000a). This resulted in a TLS ranging between 12E-08 to
4.6E-08 fatal accident per flight hour due to collision. Finally, the analysis
apportioned the value of TLS to three flight dimensions and thus calculated a
TLS for collision due to loss of lateral separation to be between 4E-08 and
1.5E-08 fatal accidents per flight hour.
ICAO Review of the General Concept of Separation Panel (RGCSP) - in 1995,

the ICAO RGCSP reviewed several approaches to deriving a TLS for ATM and
accepted the one developed by ICAO NATSPG. The RGCSP assumed a total
accident rate from all causes to be 1E-07 per flight hour for the year 2010. This
TLS is based upon the NATSPG analysis extrapolated to the year 2010
(Brooker, 2004). Based on the contributions from the US (TLS ranging between
2E-09 and 7E-09) and the USSR4, the RGCSP agreed upon TLS value that
should
be
used
for
establishing
any
vertical
minimum
performance
specification. This value is equal to or better than 5E-09 fatal accidents per
flight hour arising from collisions due to any cause for the period 2000 to 2010.
This value of a TLS is also indicated in the ICAO Annex 11 (ICAO, 2001c);
ICAO Annex 11 - in the situation where fatal accidents per flight hour is
considered to be an appropriate metric, ICAO Annex 11 (ICAO, 2001c)
proposes a TLS of 5E-09 fatal accidents per flight hour per dimension after the
year 2000. Although ICAO Annex 11 does not provide any justification for this
TLS, it is assumed that this value is taken from the ICAO RGCSP. For the
period prior to the year 2000, ICAO Annex 11 recommends the use of a TLS of
2E-08 fatal accidents per flight hour per dimension; and
ICAO All-Weather Operations Panel (AWOP) - the objective of the ICAO AWOP
was to assess the required navigational performance (RNP) for approach,
landing, and departure phases of flight (ICAO, 1994). Based upon historical
Based on 36 fatal accidents and an estimate of 15.5 million flight hours during the period
1959-1966.
4
The USSR developed a series of targets for progressive implementation, such as 1E-08 from
1990 to 2000, 5E-09 for 2000-2010, and 2E-09 for 2010 onwards (ICAO, 1995).
59
Chapter 3
data5, ICAOs calculation determined the average hull loss to be 1.87E-06 per
flight or 1.27E-06 per flight hour. Based on this historical data, ICAO proposed a
TLS for hull loss per flight hour to be 1E-07. The rationale for this risk
improvement over the historical accident rate is the removal of pilot errors by
the use of glass cockpit aircraft and tunnel incident alarm. The glass cockpit is a
system of electronic displays presenting all information on an aircraft's situation,
position, and progress. The tunnel incident alarm is an alert that is triggered if
the aircraft unintentionally leaves the assigned flight path, the tunnel, during
the approach and landing phases of flight. Additionally, the objective in aviation
safety is to reduce the number of accidents despite increasing flight hours. This
is essential if public confidence in aviation is to be maintained as the global air
transport system expands.
3.4.3.1.4 Summary of the various TLS analyses

The previous section has given an overview of the research on aviation TLS which is
summarised in Table 3-2 (based on the information available). This table enables
comparison of the TLS taking into account the source of data, the time period covered
by the data set, the type of accident, the type of aircraft operation, and the TLS unit
used.
Once again the differences in the derivation of TLS should be pointed out. The
summary presented shows the level of discrepancy in the method, data set, and
taxonomies used. The major factors that drive the differences in the calculation of
target levels of safety are:
Type of accident (accident, fatal accident, hull loss),
Weight of aircraft involved in the accident,
Differences in the definitions (i.e. taxonomies used),
Type of operations analysed: scheduled vs. non-scheduled, commercial vs.

non-commercial (military, freight, general aviation), registered vs. nonregistered, domestic vs. international,
Type of aircraft included: jets vs. turbo props,
Time frame of the data set analysed,
Source of the data,
Data set covers hull loss accidents for the period from 1959 to 1990 for commercial jet aircraft
whose weight exceeds 60,000lbs. Exposure percentages are based on an average flight
duration of 1.47h. A hull loss accident is defined as an accident where the primary cause is hull
loss or aircraft damage beyond economical repair.
60
Chapter 3
Region involved in the analysis (with or without former Soviet Union),
Targeted year for the TLS calculation: current vs. future levels.
Table 0-2 Summary of various analyses on aviation TLS

Scope
Region/time
period
Type of
operation/
weight/type
of accident
Target
year
TLS
Worldwide
1960s
Serious
accident
Not
specified
1E-06 per
flight hour
WAAS
Worldwide
1990-1999
Jets & turbo

props/
MTOW>5,70
0t/fatal
accidents
2020
1.8E-07 per
flight/7.4E-08
per flight hour
Not specified
Worldwide
Jets/19591966
Not
specified
2.34E-06 per
flight
Not specified
Not
specified
Jets/fatal
accidents
2010
1E-07 per
flight hour
En route fatal
accidents
After the
year
2000
5E-09 per
flight hour per
dimension
(1.5E-08 per
flight hour)
Jets/MTOW>
60,000lb/
hull loss
accidents
Not
specified
1E-07 per
flight hour
Reference
Title
Database
Joint
Aviation
Authorities
JAR 25.1309
Large
Aeroplanes Advisory
Material - AMJ
Not specified
UK Civil
Aviation
Authority
Aviation Safety
Review
CAP 701
North Atlantic
Systems
Planning Group
(NATSPG)
Review of the
General
Concept of
Separation
Panel (RGCSP)
ICAO
ICAO
ICAO
Annex 11
Not specified
Worldwide
All-Weather
Operations
Worldwide
ICAO
Not specified
Panel (AWOP)
1959-1990
th
15 meeting
Key: MTOW = maximum take-off weight of the aircraft
After the review of the most relevant analysis and methods of TLS calculation, the TLS
of 1E-08 accidents per flight hour is used as the baseline for the year 2020 (target year
of the research presented in this thesis). The reasons for using this baseline are:
The rate of 1E-07 is currently used as a target by ICAO for both fatal accidents
and hull loss accidents (see Table 3-2);
With the overall aim of reducing the accident rate given the current safety
targets, it is reasonable to aim at 1E-08 accidents per flight hour in the year
2020;
The analysis conducted by the UK CAA to predict the role of fatal accidents for
2020 (i.e. 7.4E-08 fatal accidents per flight hour).
61
Chapter 3
Once the TLS for the year 2020 is determined, the next step is to apportion the
contribution of ATC in the overall air transport TLS. To establish this, several studies
have been reviewed. The key findings are presented in the following section.
3.4.4 Target level of safety and Air Traffic Control risk budgeting
The next step is to determine the risk budget allocation for the ATC system as a
component of the overall air transport system, i.e. determine the contribution of ATC.
According to the results of the UK CAAs analysis, the contribution of ATC and ground
aids to aircraft accidents is 1.7 percent (Table 13 in EUROCONTROL, 2005).
EUROCONTROL currently uses 2 percent as a maximum direct contribution of ATM to
aircraft accidents within the European Civil Aviation Conference (ECAC) region. This
figure was derived based upon historical data (ICAO ADREP database focused on the
ECAC region) from which a contribution of ATC is determined to be 1.1 percent
(EUROCONTROL, 2001a). Recognising that only ATC causes were accounted for
(without contribution of other ATM components, such as ATS, ASM, AFTM)
EUROCONTROL allowed additional 0.9 percent, resulting in 2 percent of ATM
contribution to aircraft accident. This figure has been further validates via discussions
with
EUROCONTROL
Safety
Regulatory
Commissions
task
force
Hazard
Classification Matrix (HCM). EUROCONTROL has defined the maximum tolerable

probability of ATM directly contributing to an accident of a commercial air transport
aircraft in the ECAC region to be 1.55E-08 per flight hour (EUROCONTROL, 2001b).
This figure is based on the rate of aircraft accident for the year 1999 (extracted from
ICAO ADREP database focusing on the ECAC region) with direct ATM contribution (2
percent) and a forecast of 6.7 percent increase in the traffic volumes for the period
1999-2015 (EUROCONTROL, 2001a).
In the Netherlands, a study by the national research laboratory (NLR) used a sample of
civil aircraft accidents that occurred worldwide during the period 1980-1999, mostly
based on ICAO database (van Es, 2003). This study determined that ATM-related
accidents represent 8 percent of the total number of accidents. Additionally, 28 percent
of these ATM-related accidents are directly caused by ATC, which makes the ATC
contribution to aircraft accidents approximately 2.2 percent. The difference in the
contribution of ATC in these two studies is due to the difference in classification of
causal factors. While the UK CAA analysis divided all underlying factors into primary,
causal, and circumstantial groups, the NLR analysis followed the recommendation by
62
Chapter 3
ICAO and did not use this distinction. The NLR study considered an occurrence as a
causal factor only if that occurrence was part of the chain of events leading to the
accident. The NLR approach seems to reflect better the aim of determining the overall
ATC contribution to aircraft accidents.
The results presented above need to be augmented for possible statistical error and
uncertainties linked to the reporting processes as well as to provide additional
protection for the future. As previously discussed, EUROCONTROL allowed additional
0.9 percent for statistical error and uncertainties in the calculation of the ATM safety
targets for ECAC region based upon historical data for only one component of ATM,
namely ATC (EUROCONTROL, 2001a). With this in mind, together with the results
from UK CAA and NLR studies, this thesis uses a maximum contribution of ATC of 3
percent. Thus, using the previously established TLS for air transport system for the
year 2020 (in the previous section), apportioned contribution of ATC is considered to
be 3E-10 per flight hour. Now, after deriving the TLS for ATC specifically, this functional
block should be divided between human operators, equipment, and procedures. This
approach now gives the opportunity to define the appropriate risk induced by failure of
ATC equipment which is presented in the next section.
3.4.5 Target level of safety and Air Traffic Control equipment risk
budgeting
It is important to determine the contribution of equipment (or their failure or malfunction)
to the ATC risk budget. The historical data on the proportion of incidents in which
equipment failure is implicated varies to a certain degree. Interviews with system
control and monitoring staff at two European ATC Centres6, as well as the
approximation used by the CORA 2 documentation (EUROCONTROL, 2004c) reveal
that equipment failures are the causal factor in 0.01 or one percent of all incidents.
Although this assumption is based on the ATM system and not its ATC component
only, it is used with other sources of information to inform the ATC equipment risk
budgeting within overall air transport system.
More focused approach is provided by the NLR study (van Es, 2003). This study
determined that the particular causal factor ATC ground aid malfunction or unavailable
has been attributed to 5 percent of all ATM related accidents or 18 percent of all ATC
related accidents. It should be noted that this causal factor includes unavailable ATC
Based upon private communications with staff at two European Area Control Centres (ACCs).
63
Chapter 3
equipment meaning equipment that was taken out of service by ATC staff, presumably
for maintenance reasons. In addition, the research was based on data samples that
incorporated older systems with lower levels of automation. Future systems are shifting
more towards a higher level of automation and higher reliability, as discussed in the
previous Chapter.
Therefore, it can be approximated that equipment failures represent the causal factor in
10 percent of all ATC related accidents (or 3 percent in all ATM related accidents). This
is based on the assumption that unscheduled failures constitute about 50 percent of
the failures in the NLR analysis discussed above. This approach derives a risk of an
ATC equipment failure leading to the aircraft accident to be 3E-11 per flight hour. The
reasoning presented seems to correlate with the widespread argument that human
error represents the causal factor in 70-80 percent of all accidents (Reason, 1997).
Although there is some evidence that the majority of these human errors represent
organisational errors (Johnson and Holloway, 2004). A graphical representation of the
determined risk budgets is given in Figure 3-5.
Figure 0-5 Aviation TLS and risk budgeting
After assessing the contribution of ATC equipment failures to the overall risk of aircraft
accident, it is important to validate these findings with some operational experience.
This is achieved in the following section by analysis of operational failure reports from
three countries.
64
Chapter 3
3.5 Preliminary analysis and validation of operational failure

reports
The previous sections described the process of deriving an overall aviation TLS for the
reference year 2020 and further risk budgeting for ATC equipment. In order to justify
the use of the available sample of operational reports in this thesis, this sample is
validated by the proposed TLS methodology. This is presented in the following
paragraphs.
Having the accident rate for the year 2000 (EUROCONTROL, 2005) and predicted
accident rates for the year 2010 (1E-07; Brooker, 2004) and 2020 (1E-08, used in this
research), it is apparent that future safety levels are predicted to improve tenfold every
decade. This is in line with the attempts of various aviation institutions to significantly
improve future aviation safety levels (e.g. FAA, ICAO). The next step is to implement
the established rate of improvement to the ATC equipment failures.
Using the same analogy and the ratios within an air transport system, as presented in
Figure 3-5, it is possible to translate the 2020 rate of ATC equipment contribution to
aircraft accident to the present levels (i.e. 2000). The calculation presented in section
3.4.5 showed that for the year 2020 this effect is of the order of 3E-11 per flight hour.
Using the reverse logic, this effect equals to the level of 3E-09 for the year 2000. In
other words, based on the past research and established ratios the contribution of
equipment failures to the overall safety of air transport system in the current period is in
the order of 3E-09 per flight hour.
Having established the contribution of equipment failures to the overall safety of the air
transport system based on past research, it is necessary to calculate the same value
using the available operational failure reports. The conformance of ATC equipment
budgeting obtained from past research and available failure reports would indicate that
the available sample is representative of equipment failures occurring in the operational
ATC environment.
Firstly, it is important to discuss the overall commercial air transport accident rates for
the three countries analysed. These rates are slightly higher than the worldwide
average (1E-06 per flight hour; see Figure 3-5), ranging from 1E-05 and 9E-06 aircraft
accidents per flight hour). Secondly, it is necessary to discuss the available sample of
operational failure reports by focusing on the frequency of equipment failure reports per
65
Chapter 3
year and per source. The incident reports used in this section were from three sources,
namely three Civil Aviation Authorities (CAAs), presented as Country A (for the period
1999 to 2003), Country B (for the period 2001 to 2005), and Country C (for the period
1992 to 2004). The final results of this preliminarily analysis of available operational
reports are presented in Table 3-3. The average number of failures is calculated for all
three data sets (column 4). This is followed by the calculation of incident rates based
on the average flight hours flown for the given time periods (column 5). The final step
involved adjustment of the calculated incident rate to give the probability of accident
caused by equipment failure (using the accident to incident rate of 1 in 10,000) as
shown in the last column on Table 3-3. In other words this calculation produced the
operational level of safety for three countries and three respective time periods.
Table 0-3 Analysis of operational failure reports and results
Country
Year
(1)
(2)
1999
2000
2001
2002
2003
2001
2002
2003
2004
2005
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Total number of
equipment
failures
reported
(3)
100
107
122
287
175
184
237
171
247
485
28
38
41
21
16
42
40
25
38
27
46
42
44
Average
number of
equipment
failures per year
(4)
Rate of failure incident (per

flight hour)
Rate of failure accident (per flight

hour)
(5)
(6)
158.2
1.15E-04
1.15E-08
264.8
2.58E-04
2.58E-08
34.46
8.85E-05
8.85E-09
Based on the contribution of equipment failures to the overall safety of air transport
system extracted from the past research and overall TLS methodology (3E-09 per flight
66
Chapter 3
hour), we can conclude that the TLS levels acquired from operational reports (last
column in Table 3-3) show a degree of conformity.
Even higher levels of conformity would be achieved with setting of higher level of TLS
for year 2000 (data indicate 1E-05 as opposed to 1E-06 accepted within aviation
community). Furthermore, better tuning of the current and future trade-offs within the
air transport system (see Chapter 2, Figures 2-1 and 2-3) would additionally enhance
the proposed methodology for determination of risk budgeting of the ATC equipment.
Future advancements in technology, changes in the levels of traffic, and overall
changes in the ATC/ATM philosophy (e.g. shifting of separation responsibility from the
ground to the air) have a potential to improve safety. At the same time it is reasonable
to assume that the distribution of the levels of risk within the air transport system will
change. The results specific to ATC given here could be used as an input to a
complete safety analysis that should consider trade-offs between the various
components of the aviation system to realise risk budgets for a safe and cost effective
system. Finally, the severity of the reported incidents could be used to inform the
weighting scheme and to better reflect the accident to incident ratio, as the above
analysis considered all incidents equally.
In short, the above analysis indicates that the available operational failure reports are a
representative sample of equipment failures occurring in ATC Centres worldwide.
Having established the appropriateness of this sample, the following Chapter moves
toward the identification of operational characteristics of equipment failures extracted
from past research and operational failure reports.
3.6 Summary
This Chapter starts with a precise definition of equipment failures and hazards,
representing a sub-group of equipment failures that require human intervention (or
human recovery). It continues by presenting a sample of operational failure reports
available in this research. After discussion on the reporting schemes designed to
capture incident occurrences, including equipment failures, the Chapter continues by
highlighting data pre-processing problems and solutions applied to overcome them. In
order to assure the relevance of equipment failures captured in the sample available,
the remainder of the Chapter builds a framework for its validation. This framework for
risk assessment, based entirely on past literature, begins from the risk assessment of
the overall air transport system and focuses on one component, namely ATC
67
Chapter 3
equipment. In other words, this section determines the maximum allowed accident risk
imposed by ATC equipment failures for the target year 2020.
The contribution of equipment failures to the overall safety of air transport system
extracted from past literature have then been compared with the result obtained from
the analysis of available sample. This analysis showed a degree of agreement between
the theoretically assumed and operationally extracted levels of ATC equipment risk
budgeting. In other words, the available operational failure reports are a representative
sample of equipment failures occurring in operational ATC environment. Hence, the
next Chapter proceeds with a detailed assessment of the equipment failure
characteristics extracted from operational failure reports and available literature.
68
Chapter 4
Equipment Failures in ATC
Equipment Failures and Technical Defences in Air

Traffic Control
The previous Chapter showed that operational failure reports available in this thesis
constitute a representative sample of equipment failures occurring in the operational Air
Traffic Control (ATC) environment. This Chapter moves toward the identification of the
operational characteristics of equipment failures. These are extracted from past
research and more than 20,000 operational failure reports. Special attention is paid to
the impact that equipment failures may have on ATC operations, and as a result a
severity rating scheme has been designed to support the research presented in this
thesis. Having discussed the consequences of equipment failures and their impact on
ATC operations, it is important to discuss how such consequences can be prevented or
mitigated. This involves the process of recovery from equipment failure and a
distinction can be made between technical and human recovery. This Chapter
discusses technical recovery by reviewing the existing technical built-in defences,
whilst the next Chapter discusses human (i.e. controller) recovery. A subset of
equipment failure characteristics relevant to ATC operations is then used in this
Chapter to develop a novel tool for the assessment of the severity of equipment
failures, known as the qualitative equipment failure impact assessment tool. This tool
enables an assessment of the overall impact of an equipment failure on ATC
operations.
4.1 Equipment failure characteristics

When dealing with any type of equipment failure, it is important to understand its
underlying characteristics. In other words, it is important to take into account issues like
causes, consequences, duration, and complexity. Thus, a detailed hazard analysis
would capture the most important characteristics of a failure and the context
surrounding its occurrence (Leveson, 1995). The following sections explain several
important failure characteristics:
ATC functionality affected;
Complexity of failure type;
69
Chapter 4
Time course of failure development;

Duration of failure;
Potential causes of equipment failure; and
Consequences of equipment failure.
The consequences of equipment failures are discussed on several different levels,
ranging from their impact on the individual (i.e. the air traffic controller), the operations
room, the ATC system, and the impact they have on the overall ATM system.
4.1.1 ATC functionality affected

The methodology adopted in this thesis for the classification of ATC functionalities
results in a nine-category classification (Chapter 2, section 2.3). Several examples of
the equipment failures related to different ATC functionalities are presented in Table 41. These examples are randomly selected and de-identified from operational failure
reports available in this research, as discussed previously in Chapter 3.
Table 4-1 Examples of equipment failures related to different ATC system functionalities (as
defined in Chapter 2)
Type of failure
Communication
function
Navigation function
Surveillance function
Example
Total radio telephony failure on three frequencies (three sectors).
Workstation had to be reset to default fallback setting.
Runway 15 Instrument Landing System (ILS) failed whilst aircraft on 16
NM final approach in Instrument Meteorological Conditions (IMC).
Approach Control Centre was advised and aircraft confirmed the failure.
Aircraft was preparing for a missed approach, when the ILS returned to
service after recovery.
Erroneous altitude readings displayed on radar for B777 and B767 at
FL340 and FL350, respectively. Short term conflict alert (STCA) was
activated.
Data processing
function
Triple failure on suite flight data exchange. System fully recovered after 40
min by manual intervention. Departures from two airports were stopped for
approximately 10min. The cause was the existence of duplicate flight
identity numbers within the flight data held in the affected workstations.
Supporting function
B737 was on the final approach at 50ft over the runway when the controller
received a false Approach Monitoring Aid (AMA) warning. The controller
was concerned that in low visibility conditions a go-around would have
been unnecessarily given.
Safety nets (SNET)
STCA failed to activate against two aircraft at FL120. One aircraft was
dropping parachutes, with the other filming them. Consequently, the
aircraft were quite close to each other. They were both squawking
Secondary Surveillance Radar (SSR) codes, but Short term Conflict Alert
70
Chapter 4
(STCA) failed to activate.
Power supply
At time 0535 power failure in the tower caused Radar Data Processing
System (RDPS) and Flight Data Processing System (FDPS), radar, public
telephone network, weather radar, and computer failure. At time 0650
position rebooted and upgraded. ATC service returned to normal at 0730.
Pointing and input

devices
Cursor frozen in global ops field of electronic flight strip. The controller was
moved to an adjacent console and resumed operations from that position.
There was only a brief interruption to the service.
System monitoring
and control function
At 0215 the ATC system suffered a significant slowdown. The System

Monitoring (SMS) shut itself down.

Failures can be single or multiple component failures (Wickens et al., 1998). A single
failure can be total or partial affecting only one piece of equipment or one of its
components. Multiple component failures can be independent of each other (which can
make the process of diagnosis very difficult) or dependent failures (common cause,
common mode, or cascade failures) (Mauri, 2000). Common cause failures occur when
a single cause creates simultaneous (or near simultaneous) multiple failures (e.g. due
to fire, loss of power, or software bug). Common mode failures are a subset of common
cause failures whose observed effect on the system is identical. Cascade failures are
dependent failures that affect redundant components by shifting their load sequentially
(e.g. power grids or servers). Once the first level of redundancy is pushed beyond its
capacity (e.g. transformer), the load will be shifted onto the next redundant component
until all redundancies are exhausted (Mauri, 2000).
4.1.3 Time course of failure development

In terms of time course of failure development, there are sudden, gradual, or latent
failures. With sudden failures, the operator does not have much time to prepare for
recovery, but at the same time there is the potential advantage of immediate detection
of the failure. Contrary to this, gradual failures may degrade system capabilities in ways
that are not apparent to the operator (e.g. gradual loss of data integrity). This makes
failure detection, and therefore technical and human recovery extremely difficult. Latent
failures are generally difficult to detect. These failures exist in the system unnoticed
until the occurrence of some other failure or unusual occurrence reveals long-existing
latent failures in the system (Wickens et al., 1998). As a result, this group of failures is
observed separately, as the time course of their initial development is not known, i.e.
these failures could occur initially either as sudden or gradual.
71
Chapter 4
4.1.4 Duration of failure

Duration of failure is defined as the time between the first log of the event (corresponds
closely to the failure detection) until its final closure. Applied to a specific failure, it can
carry important information on recovery and its impact on ATC, ATM, and overall
aviation safety. The categories defined in this research are based on the evidence from
the available operational failure reports. Their analysis indicates the distribution of
failure duration which corresponds to the following categories (section 4.4.6):
Short period of time - order of magnitude is in minutes;
Moderate period of time - order of magnitude is in minutes up to one hour; and
Substantial period of time - order of magnitude is in hours (it can extend to days).
4.1.5 Potential causes of equipment failures

The causes of equipment failures come from the three interacting sources. These are:
Technical faults as defects or anomalies built into the system or its components;
Human errors or violations as acts of omission or commission by the designer,
constructor, controller, engineer, or maintenance personnel that might result in a
failure; and
External factors or unfortunate, unforeseen, or uncontrolled events, such as severe
weather, fire, accidents, vandalism, sabotage, or terrorism.
The listed causes of failures represent only the first layer of causation. Further analysis
might reveal the existence of organisational error, organisational loss of control, or
failure to anticipate all hazardous conditions and prepare appropriate defences against
them. As an example, the impact of a power outage should be anticipated by
management
and
consequently
appropriate
preventive
strategies
should
be
implemented. Similarly, the threat of either terrorism or vandalism should be guarded

against through the provision of adequate internal security measures.
There are various techniques designed to investigate technical faults, human error, and
organisational error. For technical faults, Fault Trees (FT), Event Trees (ET), and
Probabilistic Safety Assessment (PSA) are mostly applied (Brooker, 2006); human
error is investigated by a range of Human Reliability Assessment (HRA) techniques
which are discussed in more detail in Chapters 7 and 8. Finally, organisational errors
are mostly investigated using the Reason model (Reason, 1997), the Human Factors
72
Chapter 4
Analysis and Classification System-HFACS (Shappell, 2000), or qualitative principles

behind a safety culture (Sorensen, 2002).
After brief discussion of these five failure characteristics, the next section discusses the
potential consequences of equipment failures. The consequences of equipment failures
are discussed at several levels, from their impact on the individual (i.e. the controller),
the operations room, the ATC system, concluding with their impact on the ATM system
as a whole.
4.2 Consequences of equipment failure

Equipment failures that penetrate existing technical built-in defences and hence affect
controller performance (called hazards) are the main objective of the research
presented in this thesis. Therefore, the consequences of these failures are initially
assessed at the level of the controller, followed by the operations room, a given
airspace (i.e. the impact on ATC operations), and finally at regional level (i.e. the
impact on ATM operations).
4.2.1 Impact on air traffic controller

The impact of equipment failures on controller performance represents the focus of this
thesis, and as such will be assessed in detail in the following Chapters. One equipment
failure occurrence in the Lisbon ATC Centre highlights the impact that equipment
failures could have on the controller (Sampaio and Guerra, 2004). In this very busy
sector, a sudden failure of the Radar Data Processing System (RDPS) affected only
one radar track. This failure went unnoticed for 21 minutes until a traffic advisory by the
cockpit-based Traffic Collision and Avoidance System (TCAS) triggered an action by
the controller. The controller did suspect some problems prior to the TCAS alert
focusing only on human error in the input of relevant data (i.e. SSR code).
Unfortunately, the controller never considered the possibility of an equipment failure.
Post-incident investigation revealed that the cause of this failure was incompatibility of
the software developed for the installed radar with the software of the main ATC
system. However, the same investigation did not reveal why this failure affected only
one radar track and not all tracks informed by the same radar. This particular example
highlights how complex and severe an equipment failure can be.
4.2.2 Impact on operations room

The impact of equipment failures on the entire ATC operations room depends entirely
upon the failure characteristics in terms of the number of equipment/positions affected.
73
Chapter 4
Another important factor is the overall ATC Centre architecture, since exposure to
failure varies greatly based on the interconnectivity of different equipment, the level of
separate channels (redundancy/variability), and failure complexity (single failure vs.
multiple failures). Based on operational experience (NATS, 2002) and ATC operations
room configuration, four categories can be differentiated. These categories range from
the impact on the entire operations room, several sectors, or only one sector. The
categories are defined as follows:
All workstations/all sectors affected;
A number of workstations/different sectors affected;
Several workstations (within same suite)/one sector affected; and
One workstation/one sector affected.
The proposed categorisation by NATS follows the severity of the impact of failures on
the operations room starting with the most severe failure (known as outage) to the least
severe type of failure (affecting only one workstation). In addition, each suite is
responsible for a specific portion of airspace (i.e. sector) whilst each sector has a
declared capacity (expressed in terms of the number of aircraft in the sector in the peak
hour). As a result, the failure characteristic impact on operations room is linked with
the number of aircraft exposed to the impact of equipment failure.
4.2.3 Impact on ATC operations

The impact of equipment failures on Air Traffic Control (ATC) service provision should
incorporate effects from an operational, safety, and financial perspective. In terms of
ATC operation, equipment failures could result in an inadequate ATC service, leading
for example to unexpected or increased delays in service provision (aircraft performing
holding procedures due to a failure of the Instrument Landing System ILS during the
landing phase of flight), delayed arrivals/departures, and limitations in capacity due to
traffic flow restrictions or stopped departures/arrivals.
From the safety perspective, failures generate unavailability of certain ATC functions.
They also generate increased workload as a result of unexpected and highly stressful
failure occurrences increasing the potential for incident/accident occurrence. Vitally,
safety could be jeopardised by any type of data integrity equipment issue when the
equipment provides timely but inaccurate information. On such occasions, an
equipment failure could go undetected for some time (see the example discussed in
section 4.2.1). All of these, combined with inadequate or insufficient training, the
74
Chapter 4
absence of recovery procedures, and a lack of experience may create the potential for
controller error.
From a financial perspective, equipment failures create planned and unplanned costs
of repair, training (of both controllers and technicians), and incident investigation.
However, the most likely costs are measured in terms of additional costs placed on
airlines in the case of significant delays (e.g. loss of connecting flights and passenger
accommodation). These are discussed further in the next section.
Ideally the combination of all three consequences of an equipment failure should
constitute the overall impact on ATC operations or the particular failures severity.
However, in the operational environment the most usual practice is to combine safety
and the operational impact of an equipment failure to determine its severity rating. The
following paragraphs review severity ratings defined specifically for equipment failure
occurrences. They originate from safety regulations defined in two Air Navigation
Service Providers (ANSPs) and one Civil Aviation Authority (CAA).
The UK National Air Traffic Service (NATS) recognises four categories of failure types
based on their impact on ATC operations, namely major impact, impact on workstation
or suite, ATC impact, and minimal impact (Table 4-2). Furthermore, analysis of
operational failure reports in this thesis identified the severity categorisation from one
CAA (referred to as Country C) and another ANSP (referred to as Country D). The CAA
of Country C defines the severity rating of equipment failures according to the potential
to cause a significant problem (see Table 4-3).
Table 4-2 UK NATS severity rating (from NATS, 2002)

Severity
Major impact to
Ops room
Impact to
workstation/suite
ATC impact
Minimal impact
Definition
Severe flow restrictions could be required
May be necessary to combine/move positions immediately or sector flow
restrictions may be required
Not immediately critical, will have greater operational impact over time
Centre management required
75
Chapter 4
Table 4-3 Country Cs severity rating as defined by its CAA

Severity
Factor
Definition
CR
Critical
An occurrence or deficiency that caused, or on its own had the

potential to cause, loss of life or limb.
MA
Major
An occurrence or deficiency involving a major ATC system

component that caused, or had the potential to cause, significant
problems to the function or effectiveness of that system.
MI
Minor
An isolated occurrence or deficiency not indicative of a significant

ATC system problem.
Finally, the data for Country D originate from one particular ATC Centre. This Centre
determines the severity of an incident as a result of the combination of the impact it has
on both the controllers (internally in this ATC Centre as well as externally in other ATC
units) and system control and monitoring engineers. In general, in this particular ATC
Centre the determination of the severity of an incident is the task of the system control
and monitoring unit which distinguishes five severity classes. These are presented in
the Table 4-4.
Table 4-4 Country D severity rating as defined by the particular ATC Centre
Severity
Factor
Definition
System down
A system outage affecting the total of ATC services provided
Critical
An error severely affecting a single or few random working

positions or a single external service or an error on a first
standby system.
Urgent
An error affecting part of a single or few random working

positions or part of an external service or an error on a
backup system reducing backup capacity.
Important
An error affecting a supportive service or a system for which

automatic backup is available.
Enhancement
An error having no direct operational impact and only slight

non-operational impact.
These severity rating schemes indicate that each country follows its own severity index.
Furthermore, there is a difference in severity ratings between ANSPs and CAAs, as
ANSPs are concerned about the impact on their service provision business (e.g.
delays), whilst safety regulators are concerned about whether such an event causes an
accident. Therefore, simply comparing the severity of occurrences between countries is
unlikely to produce useful findings. All classifications are rather qualitative and depend
76
Chapter 4
upon experience and judgement, which always involves a degree of subjectivity. As a

result, it is necessary to define a unique severity classification for the entire dataset
available in this study corresponding to the existing equipment failure severity ratings
(UK NATS, Country C, and Country D). Consistent with operational practice, the
severity rating defined in the following paragraphs combines safety and operational
impact of equipment failures, while disregarding the financial aspect due to lack of
data. Since the focus of this thesis is on the impact of equipment failures on ATC
operations (including its impact on controller performance), the exclusion of the
financial aspect of severity rating does not have a detrimental effect on this severity
rating and the subsequent quality of data analyses.
The result is a three-level severity rating (major, moderate, and minimal) of equipment
failures based on their impact on ATC operations, as would be appreciated by the
controller (Table 4-5). It is important to highlight that this severity categorisation is
based on the exposure of an ATC Centre to the failed equipment (affecting the entire
ATC Centre, a number of workstations, or only the backup system) regardless of the
type of service provided by the affected ATC Centre. The significant difference in the
level of detail in the reports and the overall need for a consistent approach led to the
exclusion of the type of ATC service in the overall severity categorisation. This
characteristic is accounted for later on in the thesis through the assessment of the
recovery context surrounding an equipment failure occurrence. As a result, this
exclusion here does not have detrimental effect on the severity rating and the
subsequent quality of data analyses. In general, the severity rating is based on the
failure type, available contextual conditions of the failure occurrence, and its impact on
ATC operations.
Table 4-5 Severity rating defined in this research and mapped with available sources
Severity
rating in
this
research
Definition of the severity rating in this research
Major
Definition: This type of failure may cause severe disruptions on every

workstation. It may require immediate traffic flow restrictions to contain
workload to manageable levels, which are safe for sustained ongoing
operations.
77
Mapping with
severity ratings
from available
research
Major
(UK NATS)
Chapter 4
Examples: loss of main Flight Data Processing System (FDPS), total

voice communication outage, loss of Multiple Radar Processing
(MRP), loss of Terminal Approach Radar (TAR), loss of Parallel
Approach Runway Monitor (PARM), loss of radar coverage, either
complete or over larger parts (Primary Surveillance Radar - PSR and
secondary surveillance radar - SSR), total power failure, loss of all
Radio Telephony (RT) frequencies, incorrect barometer indication (as
part of meteorological equipment), Instrument Landing System (ILS)
failure during approach phase and in the reduced visibility conditions,
failure of runway/taxiway lights in reduced visibility conditions, wrong
indication of runway/taxiway lights, Surface Movement Radar (SMR)
failure or provision of wrong label indication.
Definition: Only affects workstations reliant on the failed item or
service. The disruption of ATC operation is contained and a normal
level of operation may be resumed by physically moving and
combining the role of the affected workstations with another within the
sector suite or by physically moving the sector team to the stand-by
suite. Under some conditions, sector flow restrictions may be applied.
Moderate
Examples: loss of single sector frequency, loss of a number of

frequencies, loss of one or two workstations in a sector suite, loss of
entire sector suite, loss of telephone panel or Voice Switching And
Communication System (VSCS) on a single workstation, loss of one
radar (in multiple radar environment), loss of ground-based
navigational aids (e.g. Very high frequency Omnidirectional Range VOR, Non-Directional Beacon - NDB, Distance Measuring Equipment
- DME), loss of PSR (as it is a backup to SSR), SSR garbling, loss of
safety nets (as these are only tools to support controller).
Definition: Initial disruption to ATC operations is not immediately
critical, but could have greater impact over time (If not recovered
within a reasonable time frame, disruptions to ATC operations may
be prolonged/sustained). This escalation with time can restrict traffic
flow into sector(s).
Minimal
Examples: loss of processor, loss of link, loss of system control and

monitoring unit, loss of headset, ILS failure during approach in normal
visibility conditions because the opportunity for go-around always
exists, failure of runway/taxiway lights (in normal visibility conditions)
as this system is only a visual aid to the instrument landing, failure in
communication link to adjacent ATC Centre, loss of auxiliary display,
temporary failure of strip printer or paper jam, inadequate strength of
RT frequency, failure of left hand headset connector while right hand is
functioning, disturbance/interference on a ground frequency, loss of
sequencing tool, and loss of pointing/input devices.
Major
(Country C)
1
(Country D)
Impact on
workstation/suite
(UK NATS)
Major
(Country C)
2 and 3
(Country D)
ATC and minimal

impact
(UK NATS)
Minor
(Country C)
4 and 5
(Country D)
Having defined the three-level severity rating to be used in this research, appropriate
mapping is established with the existing severity ratings (as defined by UK NATS, the
CAA of Country C, and the ANSP of Country D). The comparison of specific categories
from each of the available sources reveals the matching with major, moderate, and
minimal ratings as defined in this research (Table 4-5). Note however that the major
category, as defined by Country C, had to be split between major and moderate
categories, as defined in this research. The rationale behind this split is based on two
78
Chapter 4
criteria of equal importance. The first criterion is the definition of major and moderate
categories as presented in Table 4-5. In other words, the severity rating has to
distinguish between failures that affect the entire ATC Centre and those that affect only
workstations reliant on the failed item. The second criterion is based on the impact of a
failure on ATC operations. For example, loss of a VOR or NDB is rated as moderate
because navigation may be still provided using radar surveillance, other navigational
aids (Global Positioning System-GPS, Automatic Dependence Surveillance-ADS).
However, loss of an ILS during the approach phase or in reduced visibility conditions is
rated as major. During this phase of flight the aircraft is in the landing configuration
(i.e. reduced speed, in close proximity to the ground). If visual contact with ground is
not achieved at the moment of the failure, an immediate go-around procedure is
necessary. Because of this, the failure of an approach navigation aid (such as ILS) is
considered more severe.
4.2.4 Impact on ATM operations

As noted earlier, it is highly beneficial to analyse the impact of the failures on
operations both inside the control room and outside over a given airspace. At the same
time, it is also important to recognise that failures could have an impact not only on
ATC but also on the wider ATM system. The following examples show how severe the
impact of an equipment failure on ATM operations can be.
According to Aviation Week (reported in RISKS, 2000; NATS, 2004), the UK ATC
service suffered a flight data processing software failure at West Drayton ATC Centre
in June 2000. As a result of the failure, flight progress strips had to be hand written,
which forced the ANSP to restrict the amount of traffic in UK airspace. While the ATC
system recovered after four hours, the effects of this failure were felt for several days
with knock-on effects as far as France and Germany. This is understandable due to the
centralised flow control of traffic in Europe (provided by the EUROCONTROL Central
Flow and Management Unit). As a result of the failures severity and subsequent flow
control, its impact spread over a sub-continental region.
Another example of a failure with a severe impact on a wide region is the brief power
failure which affected the US Federal Aviation Administration (FAA) Southern California
Terminal Radar Approach Control (TRACON) facility at Miramar on April 19, 2006. The
facility switched immediately to backup power. The outage lasted only 6 or 7 seconds,
but had an impact on airports from the Mexican border and half way through the state
of California, due to imposed traffic flow control (10News, 2006).
79
Chapter 4
Another example of the severe impact that one single failure can induce is the outage
that occurred in the Chicago ATC Centre in 1995 when the en-route automation
component failed for two hours. This single occurrence cost the airlines an estimated
$12 million in delays (National Transportation Library, 1997). The National
Transportation Library (NTL) report mentions this example to make a case for the
replacement of the outdated main and back up Flight Data Processing Systems
(FDPS), involved in the reported incident. In short, these examples show how severe
the impact of an equipment failure on global ATM operations can be. This issue will
become especially important in a future gate-to-gate ATM system where the roles for
planning and control will have to be re-organised and distributed between controllers
and pilots.
Similar to ATC operations, the impact of failure on ATM can be analysed from several
different perspectives. From operational and safety perspectives, a higher degree of
workload will be experienced both on the ground by controllers, technicians, and
engineers and in the air by flight crew. From a financial perspective, in addition to costs
identified in ATC, it is necessary to add the cost of delays in a wider region. A small
exercise has been conducted on the cost of delays induced by ATC equipment failures
to indicate the financial impact of delays in the European Civil Aviation Conference
(ECAC) and US airspace. This is presented in Appendix I.
Having discussed the consequences of equipment failures, it is important to discuss
how such consequences could be prevented or mitigated. This involves the process of
recovery from equipment failure and a distinction can be made between technical and
human recovery. The following section focuses on technical recovery and the principles
used to prevent and in some cases to mitigate the impact of equipment failures. The
human recovery aspects are addressed in Chapter 5 and throughout the rest of the
thesis.
4.3 Definition of technical defences (technical recovery)

The aim of any design is to identify the functions of a system in advance and to
develop a method which assures the delivery of the intended functions. It is always
necessary to predict what may happen if something fails or if an operator handles a
system incorrectly. Experience shows that even the best designed systems fail
occasionally. Therefore, it is crucial that every design concept includes a solution to reestablish system operation and provide continuous service. These solutions are
80
Chapter 4
grouped under the term technical built-in defences. They represent defences against
any unplanned or unwanted interruption of service. They are complex socio-technical
systems which combine technical, human, and organisational measures that prevent or
protect against an adverse effect (Smith et al., 2004). Verification of the existence and
appropriateness of existing defences provides confidence in the safety of a system and
is a requirement for system certification.
Safety is recognised as the ultimate imperative in ATC and therefore, should be
addressed as early as possible in the design process. Having sound safety principles
built into each phase of the design (i.e. conceptual, preliminary, and detailed design
phase) is a useful way to avoid, prevent, and mitigate failures and their impact. Safety
through design is planned through five different principles (Figure 4-1) for hazard1
avoidance, elimination, or control, which are as follows (Christensen and Manuele,
1999; National Aeronautics and Space Administration, 2002; The European New
Machinery Directives cited in Piantek, 1999):
Eliminate hazards;
Design for minimum risk;
Incorporate safety devices (i.e. devices designed to prevent any unwanted event);
Provide warning devices (i.e. alert that signals the occurrence of some unwanted
event); and
Develop operating procedures and training schemes.
Figure 4-1 Safety through design (adapted from Christensen and Manuele, 1999)
Within system safety, a hazard is usually defined as a condition which can lead to an accident.
In this research, a hazard is defined as the ATC system state resulting from an equipment
failure that penetrates all existing technical defences and affects the ability of the controller to
perform his/her tasks.
81
Chapter 4
The suggested principles follow the logical order of precedence. The first two
approaches focus on the elimination of the hazard from the system. However, if the
identified hazards cannot be eliminated (due to difficulties or cost), risk should be
reduced by using fixed, automatic, or other protective safety devices (i.e. defences for
seamless recovery from failure). When neither design nor safety devices can effectively
eliminate identified risks or adequately reduce them, devices should be used that
detect the unwanted condition and produce adequate warning signals to alert the
controller (i.e. defences for transmitting information regarding a failure). These warning
signals should be designed to minimise the probability of inappropriate human reaction
and response. Note that regardless of how a warning device performs (Figure 4-2), the
triggering failure represents a hazard (according to the definition in this thesis) as it
affects controller performance.
As explained before, the human operator remains the last line of defence (i.e. human
recovery). For this reason, when warning devices are not sufficient, special procedures
and training scheme should be designed. These must be periodically tested, verified,
and regularly updated to assure their effectiveness.
Similarly, when dealing with equipment failures in ATC, it is important to distinguish
between technical and human (i.e. controller) recovery (Figure 4-2). Both processes
start with the detection of failure (either by a technical system or controller) and
conclude with an outcome. The outcome can be nominal (pre-failure), non-nominal but
stable (i.e. degraded), or inadequate system state (leading to incident or accident). The
outcome of the equipment failure and recovery process is discussed in detail in the
following Chapter. The following paragraphs focus on technical recovery, while human
recovery is addressed in subsequent Chapters.
Figure 4-2 Technical and human recovery
As already highlighted, technical built-in defences can be divided in two different

categories according to the function they provide. These are defences for recovering
from failures (safety devices) and defences for transmitting relevant information on
82
Chapter 4
failure (warning devices). Both categories are examined further in the following
sections.
4.3.1 Defences for recovering from failures (safety devices)

This group of technical built-in defences should include mechanisms designed to
prevent an unwanted event or safety devices (e.g. radiotelephony anti-blocking device,
availability of primary and secondary frequency, automatic switching from normal to
fallback operational mode, automatic switching from primary to secondary glide slope
transmitter) and the creation of fault-tolerant systems though redundancy/diversity. The
main objective of built-in defences is to prevent adverse events from happening (i.e.
preventive defences) or to lessen the impact of the consequences on operations (i.e.
mitigative or protective defences). If a failure has only a preventive barrier, there is no
fault tolerance in the system, as achieved by protective defences. For example, the
feasibility study of the EUROCONTROL eight states free route airspace concept was
established to ensure that free route airspace operations are as safe as the current
fixed route operations (EUROCONTROL, 2001c). The analysis identified 128
preventive defences but no protective defences. Therefore, this concept, in its current
state, fails to establish fault tolerance in the ATM system.
Fault-tolerant systems are designed to preserve the minimum required service in spite
of failure occurrence. This is achieved through the employment of redundancy.
Redundancy is an ability of a system to keep functioning normally in the event of an
equipment failure, by having backup components that perform duplicate functions
(Mauri, 2000). The goal of this process is to mask failure events from the controller, but
also to capture it and report it for the necessary maintenance. However, redundancy
itself is not always a solution due to common cause failures (e.g. fire or power outage).
Common cause failures are due to the same cause. In order to prevent the occurrence
of these types of failures emphasis is placed on diversity of the systems (i.e. different
manufacturers), equipment diversity in manufacturing (e.g. different software
packages), and/or functional diversity (e.g. physically independent components,
redundant hydraulic system lines of commercial aircraft are physically separated so
that fire in a certain compartment does not affect all the lines simultaneously).
4.3.2 Defences for transmitting information on failure (warning

devices)
Alerts should be provided to the controller in the event of a critical change in the ATC
system or equipment status and to remind him of critical actions that must be taken. An
83
Chapter 4
alert or a warning should enhance the probability of appropriate human reaction and
response (i.e. controller recovery performance). According to the FAAs Human Factors
Design Standard (Federal Aviation Administration, 2003) warning devices should:
Alert the operator to the fact that a problem exists;
Inform the operator of the nature of the problem;
Guide the operators initial responses (based on priority); and
Confirm in a timely manner whether the operators response corrected the problem.
Alerts are usually generated immediately after the system detects any discrepancy
from predefined system performance. There are several ways in which ATC controllers
are informed of equipment failures or non-availability of certain functions. The most
usual ones are through colour-coding (e.g. change in the workstations border colour)
and textual messages, all presented on the Human Machine Interface (HMI). In
addition to the content and location of the alert message, it is equally important to
display an alert in a timely manner. Alert onset is defined as time between a systems
detection of a failure and the moment an alert is presented on the HMI either by colour
change or text message (i.e. time-to-alert or TTA). This timing is usually system-driven
(based on the system threshold) but there are novel initiatives toward human-driven or
cognitively-driven alert onset. In general there are three different types of alert onset:
Immediate onset (an alert is presented on the HMI after the system detects the
failure with the least time delay). This is the normal case for severe events.
Delayed onset (an alert is presented on the HMI with a time-based or thresholdbased onset). For example, system requirements could be set up to inject an
alert with a specific time delay following the occurrence of a failure or to inject an
alert once a system-defined threshold has been reached (i.e. TTA). In the nuclear
industry this is known as alert sequencing or alert hierarchies indicating the
urgency of actions needed. In this way, a hierarchy makes use of safety criticality,
injecting firstly safety-relevant alerts followed by operational alerts. In satellite
navigation, the TTA value is one of the measures of the integrity of a satellite
navigation system (Feng et al., 2005).
Cognitively convenient onset (an alert is presented on the HMI based on
cognitive convenience which can be defined thorough the levels of controller
workload). This futuristic concept is mostly used in the nuclear and automobile
industry where cognitive convenience is determined by measuring workload
using physiological measures (e.g. heart rate, breathing rate, galvanic skin
response, eye tracking device). This concept has been tested on a US naval ship
as described in Daniels, Regli, and Franke (2002). This study proposes a method
84
Chapter 4
to control the cognitive effects of task interruption by influencing the timing of an

alert and helping a user to regain their situational awareness within the
interrupted task.
After a detailed overview of the equipment failure characteristics as well as technical
recovery, the next section analyses the nature of equipment failures that manage to
penetrate the existing built-in defences and affect controller performance. For this
purpose, findings from existing literature have been augmented by results of the
analysis of more than ten thousand operational failure reports originating from four
different countries. This sample of equipment failure reports have already been
introduced in Chapter 3 and the following section further analyses this sample.
4.4 Analyses of operational failure reports

Existing literature on equipment failure characteristics has been reviewed in the
previous sections of this Chapter. This has been further augmented and informed by
the analyses of operational data from four countries (i.e. Countries A, B, C, and D), as
presented in detail in Chapter 3.
4.4.1 Data analysis methodology

Since the four countries are of different airspace size, equipage, traffic demand, and
density in their airspace, simple analysis of equipment failure rate would be of limited
value. Therefore, to gain a common metric to assess distribution of equipment failures
per year and per data source, it is necessary to normalise the rates of equipment
failures per appropriate unit of measurement. For example, the rates per ATC Centre
enable comparison of ATC Centres of similar traffic demands and thus equipage, but
otherwise fail to provide a meaningful performance measure. Similarly, the rate of radio
frequency failure per sector or per total number of available frequencies in a sector
(usually there are primary and secondary frequencies available in a sector) enables a
metric for the availability of voice communication in each sector. However, this unit is
not of practical use as the number of sectors changes hourly based upon changes in
air traffic demands. As a result, the rate of equipment failures per flight hours is used in
this research2. This approach avoids difficulties and differences associated with the
Hours flown data are collected for commercial airlines, including domestic, regional, and
international air traffic for each country.
85
Chapter 4
geographical coverage of the datasets available and the availability of ATC systems
and equipment (e.g. number of radars, navaids, communication systems).
The information on flight hours for each country has been extracted from the CAA
websites, annual incident summaries, and personal correspondence with the staff from
the engineering unit. After establishing the common ground with an appropriate unit of
measurement, further analyses are performed with available data structured around
four equipment failure characteristics, as they were possible to extract consistently
from available datasets. These four equipment failure characteristics are: type of ATC
functionality and equipment affected, complexity, severity, and duration3 of equipment
failures. The type of equipment/ATC functionality affected and complexity of failure type
are extracted from the short summary available for each report. The severity of
equipment failure is extracted using the available severity rating (if it existed) or
assessing the available information of the operational and safety impact of equipment
failure and thus applying the severity rating derived in this research (see Table 4-5).
The duration variable was available only in the Country D database. Finally, additional
statistical tests have been performed to identify any relationship between four
equipment failure characteristics. The structure of the data analyses is presented in
Figure 4-3.
The nature of the variables under consideration determined which statistical methods
could be used to analyse the data. As can be seen from their description in this
Chapter, most variables are categorical (type of equipment/ATC functionality affected,
complexity of failure type, and severity). Additionally, complexity of failure type and
severity variable have an ordinal character (assuming the ranking between possible
categories). Only duration represents a continuous or ratio scale variable4. This
variable is firstly investigated for its overall distribution, further to be split into categories
to extract information regarding failures of short duration (discussed in sections 4.1.4
and 4.4.6).
The duration characteristic is analysed last as it is available only in one database.

Variables can be either continuous or categorical. Continuous variables are numeric values on
an interval or ratio scale (e.g. age, income). Categorical variables can be either nominal or
ordinal. Nominal variables differentiate between categories but do not assume any ranking
between them (e.g. gender). On the other hand, ordinal variables differentiate between
categories that can be rank-ordered (e.g. from lowest to highest).
4
86
Chapter 4
Operational
failure reports
4 Countries
22,808 available reports
Data preprocessing
Available data
Reference
Traffic figures from
respective CAAs
ATC functional
classification
Chapter 2
Chapter 4, section
4.1.2
Rate of
equipment
failures
Country A, B, C, and D
Type of ATC function

and equipment
affected
Complexity of
failure type
Country A, B, and C
Severity rating
Chapter 4, Table 4-5
Severity
Country D database
Duration
Country D
Additional
statistical tests
Figure 4-3 Operational failure reports analyses
Using the SPSS statistical package, frequencies of related categories are identified and
the most frequent categories are reported for each variable. To establish relationships
between these variables, additional statistical tests are also performed. In this regard,
chi-square tests are used to test the relationships between two categorical variables.
The most important assumptions of the chi-squared statistical tests are random sample
data, a large sample size, adequate cell sizes (no less than 5 observations per cell),
independent observations, and normal distribution of deviations between observed and
expected values. The size and characteristics of the available datasets imply the
conformance with all listed assumptions. Furthermore, the Cramers V test is used to
measure the association for nominal data (i.e. ATC functionality variable) whilst the
Kendall tau test is used for ordinal data (i.e. severity and duration variables). These
tests are briefly discussed in the following paragraphs.
87
Chapter 4
Cramers V is the chi-square-based test that measures the strength of the relationship
between nominal variables and is applicable across contingency tables of size greater
than 2X2 (Berenson et al., 2006). Cramers V coefficient is interpreted as a measure of
the relative strength of an association between two variables and it ranges from 0 to 1
(i.e. 1 representing a strong association). Suppose that the null hypothesis is that two
variables are independent random variables. Based on the frequency table and the null
hypothesis, the chi-squared statistic X2 can be computed as the squared difference
between the observed (O) and expected frequency (E) in each cell, divided by the
expected frequency. Then, Cramers V coefficient is defined in equation 4-1 below:
V =
X
=
nm
(O E ) 2
E
nm
4-1
where n represents a sample size while m represents a smaller value between number
of rows minimised by one and number of columns minimised by one.
Kendalls tau is a chi-square-based test that measures the strength of the relationship
between ordinal variables applicable across contingency tables of all sizes (Berenson
et al., 2006). Kendalls tau coefficient has the following properties:
If the agreement between the two rankings is perfect (i.e. the two rankings are the
same) the coefficient takes the value of 1.
If the disagreement between the two rankings is perfect (i.e., one ranking is the
reverse of the other) the coefficient takes the value of -1.
For all other associations the value lies between -1 and 1, and increasing values
imply increasing agreement between the rankings. If the rankings are completely
independent, the coefficient takes the value of 0.
Kendall tau coefficient is defined in equation 4-2 below:
2P
1
n(n 1)
2
1=
4P
1
n(n 1)
4-2
where n represents the number of pairs, P represents the number of concordant pairs.
In statistics, a concordant pair is a pair of a two-variable observation dataset {X1,Y1}
and {X2,Y2}, where (equation 4-3):
sgn( X 2 X 1 ) = sgn(Y2 Y1 )
4-3
88
Chapter 4
Correspondingly, a discordant pair is a pair where (equation 4-4):
sgn( X 2 X 1 ) = sgn(Y2 Y1 )
4-4
Sgn represents the sign function defined as (equation 4-5):
1 , x < 0
sgn x = 0 , x = 0
1 , x>0
4-5
Therefore, a high value of P indicates that most pairs are concordant, i.e. the rankings
are consistent. A tied pair (sgn x = 0) is not regarded as concordant or discordant. If
there is a large number of ties, the total number of pairs (in the denominator of the
equation 4-2) should be adjusted accordingly (Berenson et al., 2006).
After presenting the overall methodology used for data analyses, the following sections
present some of the key findings and results.
4.4.2 Rate of equipment failures

From Figure 4-4, the rate of equipment failures for Country A initially increases greatly
before peaking in 2002, followed by a sharp drop in 2003. This corresponds to a large
number of early failures experienced with the opening of the new ATC Centre which
accounted for 63.4 percent of all reported equipment failures in that year. Country Bs
rate rises from 17.5 failures per 100,000 flight hours in 2001 to 25 failures per 100,000
flight hours in 2002. This is followed by a drop to 17.8 failures per 100,000 flight hours
in 2003 before increasing sharply in 2005. The reason for high rates in 2004/2005 is
that the air navigational service provider directed controllers to be more diligent about
filling out incident reports to improve the quality of the incident database and the overall
safety management system. Country Cs rate exhibits a steady trend for the entire
period of 13 years, being on average nine failures per 100,000 flight hours.
89
50
45
40
35
30
25
20
15
10
5
0
Country A
Country B
Country C
19 9
2
19 9
3
19 9
4
19 9
5
19 9
6
19 9
7
19 9
8
19 9
9
20 0
0
20 0
1
20 0
2
20 0
3
20 0
4
20 0
5
Rate (in 100,000)
Chapter 4
Year
Figure 4-4 Total number of equipment failures per flight hours flown in each year for countries
A, B, and C
The data available on the rate of equipment failures for Country D reveals a sharp rise
in number of equipment failures from 30 failures per 10,000 flight hours captured in the
last half of the year 2000 to 45 failures per 10,000 flight hours in 2001 (Figure 4-5)5.
The reason for this is that only five months of data was available for the year 2000.
Therefore, we can conclude that a rate of reported equipment failures in this ATC
50
45
40
35
30
25
20
15
10
5
0
Country D
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
Rate (in 10,000)
Centre decreases in absolute numbers.
Year
Figure 4-5 Total number of equipment failures per flight hours flown in each year for country D
(year 2000 incomplete)
Although the rates of equipment failure of Country D are tenfold higher compared to Countries
A, B, and C, Country D data are retained for subsequent analyses as they represent the most
detailed and reliable source of operational failure reports.
90
Chapter 4
The next section builds on this trend analysis and assesses affected ATC
functionalities. The classification of all ATC functionalities, as defined in Chapter 2, has
been used for this purpose and the findings are presented for each Country separately.
4.4.3 Type of ATC functionality and equipment affected

This section provides the analysis of ATC functionalities and their sub-functions
affected by equipment failure occurrences as reported for Countries A, B, C, and D.
Country A data shows that the two ATC functionalities most affected are the
communication and surveillance functions (Figure 4-6).
Figure 4-6 Most affected ATC functionality (Country A)
Further analysis of sub-functions and equipment most affected by failures identified the
following five types: air ground communication, secondary surveillance radar (SSR),
flight data processing system (FDPS), primary surveillance radar (PSR), and other
communication systems, ranging from pagers, headsets, microphones, cables, to
footswitches (Table 4-6).
Table 4-6 Most affected ATC equipment (Country A)

ATC equipment affected
air ground communication
secondary surveillance radar (SSR)
flight data processing system (FDPS)
primary surveillance radar (PSR)
other communication systems
Percentage
33.1
17.7
10.1
5.2
4
Similar to the previous case, two ATC functionalities for Country B most affected by
equipment failures are the communication and surveillance functions (Figure 4-7).
91
Chapter 4
Figure 4-7 Most affected ATC functionality (Country B)
Table 4-7 presents five types of equipment most affected by failures. These are: PSR,
air situational display or radar display, air ground communication, voice switching
communication system (VSCS), data exchange network, and runway/taxiway lighting.
Table 4-7 Most affected ATC equipment (Country B)

primary surveillance radar (PSR)
air situational display
voice switching communication system
(VSCS)
data exchange network
runway/taxiway lighting
Percentage
17.2
15.1
11.6
8.8
7.6
7.6
Country C shows a slightly different trend in the distribution of equipment failures per
ATC functionality. The two most affected categories are the navigation and
communication functions (Figure 4-8).
Figure 4-8 Most affected ATC functionality (Country C)
92
Chapter 4
Furthermore, the five most affected equipment types are: air ground communication,
instrument landing system (ILS), very high frequency omnidirectional radio range
(VOR), non-directional beacon (NDB), and air situational display (Table 4-8).
Table 4-8 Most affected ATC equipment (Country C)

instrument landing system (ILS)
very high frequency omnidirectional radio
range (VOR)
non-directional beacon
Percentage
23.7
19.6
7.6
6.5
5.8
Country D shows a similar trend to Countries A and B, as two most affected ATC
functionalities are communication and surveillance (Figure 4-9). Although the
navigation function seems not to be represented at all in Figure 4-9, there were only
two failures affecting this functionality and both are due to testing of Global Positioning
System (GPS) clock alarms. The reason for the under representation of this ATC
functionality is the fact that data originated from one particular ATC Centre that
provides area control service and as such is not responsible for the ground-based
navigational aids and airport-based equipment (e.g. meteorological equipment,
m
on
i to
rin
g
Sy
st
em
y
su
pp
l
er
Po
w
Po
in
ti n
g/
inp
ut
ne
ts
Sa
fe
ty
Su
pp
or
tin
g
pr
oc
es
si
ng
ei
lla
nc
e
Su
rv
D
at
a
Na
vi
ga
ti o
n
3500
3000
2500
2000
1500
1000
500
0
om
m
un
ic
at
io
n
Frequency
runway/taxiway lighting, ILS, Surface Monitoring Radar-SMR).
ATC functionality
Figure 4-9 Most affected ATC functionality (Country D)
Further analysis of data for Country D shows that the following five equipment types
are most affected by equipment failures: air situational display (radar display), data
exchange network, air ground communication, other surveillance systems (mostly
referrers to radar links), and other communication systems, such as pagers, headsets,
microphones, cables, and footswitches (Table 4-9).
93
Chapter 4
Table 4-9 Most affected ATC equipment (Country D)

other surveillance systems
other communication systems
Percentage
21.9
15.7
11.6
8.7
4
Table 4-10 collates the five ATC equipment types most affected by failures, from each
available dataset. Findings are structured according to the ATC functionality they
support (in rows) and sources (in columns). Overall it can be concluded that Countries
A, B, and D are quite similar in relation to the most affected ATC functionalities. Results
of data analyses from these three countries indicate that failures mostly affect the
communication and surveillance functionalities. On the other hand, results of data
analysis from Country C differ as failures mostly affect the navigation functionality.
These are mostly failures of ILS, followed by failures of VOR, NDB, DME, as well as
airport lighting facilities (runway and taxiway lighting). Furthermore, the only equipment
type frequently affected by failures in all four countries is air-ground communication.
Other equipment types common in available datasets are air situational display, radar,
data exchange network, and supporting communication system (e.g. pagers, headsets,
microphones, cables, and footswitches).
Table 4-10 Summary of the five ATC equipment types most affected by failures
ATC
functionalities
Communication
Country A
Country B
Country C
Country D
A/G
communication
other
communication
systems
A/G
communication
A/G
communication
A/G
communication
other
communication
systems
data exchange
network
other
surveillance
systems
air situational
display
VSCS
data exchange
network
PSR
PSR
SSR
air situational
display
air situational
display
runway/taxiway
lighting
ILS
Surveillance
Data
processing and
distribution
Navigation
FDPS
VOR
NDB
94
Chapter 4

As discussed previously in section 4.1.2 failures can affect single or multiple
components at the same time. The analysis of complexity of failure type was based on
extraction of the number of failures reported in each occurrence report, i.e. single or
multiple failures. It is assumed that failures that affect multiple components, regardless
of whether they are dependent or independent, were reported in the same operational
failure report. The personal correspondence with CAA staff in charge of the occurrence
databases from Countries A and B confirmed this assumption. According to them, if
two different items of equipment fail, but the time between failures is such that the
failure of one does not contribute to the failure of the other, then two 'single' failures are
reported separately. However, if the failures occur close together such that the failure
of one could have impacted on the failure of the other or, if unrelated, the fact that two
items failed close together meant that the controller workload is significantly increased,
then multiple failures are reported in the same occurrence report. Based on these
findings, it was necessary to capture the frequency of reports that mentioned more than
one equipment failure. This was consistently done for Countries A, B, C, and D dataset.
Country C dataset has to be separately assessed due to the specifics of their reporting
system. In other words, in Country C, the database of each occurrence has multiple
records as they report separately each finding and cause. As a result, the assessment
of the multiple failure occurrences had to be performed by assessing each individual
case and completely avoiding all non-equipment failure reports. Similarly, Country D
dataset had to be completely ignored as the reporting system of the system control and
monitoring unit accounts for each failure independently. Table 4-11 represents the
percentage of multiple failures amongst the available operational failure reports.
Table 4-11 Percentage of the multiple failure occurrences reported in the available datasets
A
B
Number of reports
with multiple failure
occurrences
42
206
24
448
N/A
N/A
Aggregated
data
272 (8.4%)
3219
Country
Total number of
reports
Comment
1378
1393
95
separate assessment due to

the specific reporting system
not applicable due to the
specific reporting system
Chapter 4
Using the severity categorisation defined in section 4.2.3, it is possible to categorise all
available equipment failure reports from operational and safety perspectives. The
following section assesses the ATC functionalities affected by equipment failure with
respect to their severity or impact on ATC operations.
4.4.5 Severity of equipment failures

Figure 4-10 presents the distribution of equipment failures according to the severity of
their impact on ATC operations. As discussed previously, three severity ratings are
recognised, namely major, moderate, and minimal (Table 4-5). Although major failures
are the least frequent, their impacts on ATC operations and controller recovery
performance are the most severe. For this reason, the rest of the analysis focuses on
major equipment failures. The distribution of the ATC functionalities most affected by
major failures may be skewed due to the Country D dataset which does not incorporate
failures of the navigation functionality (see section 4.4.3). Future research should
address moderate and minimal severity categories as these are prone to errors of
controller recovery in the absence of written and practiced procedures.
Figure 4-10 Distribution of equipment failures according to their severity
The major category accounts for 7 percent, 14.4 percent, 12.7 percent and 6.5
percent of the equipment failures within Countries A, B, C, and D respectively. These
results show the importance of assessing the degree of severity for each of the
equipment failure occurrences. For example, the majority of failures reported in the
96
Chapter 4
Country D dataset tend to have minimal impact on ATC operations and controller
performance (Figure 4-13). However, if we observe only major equipment failures, or
failures that affect an entire ATC Centre or a major part of it, it is notable that the most
affected ATC functionalities are: communication accounting for 45.3 percent of all
aggregated equipment failure reports, surveillance accounting for 29 percent, followed
by data processing and distribution accounting for 15 percent (Figure 4-11).
Country
Country A
Country B
Country C
Country D
250
Frequency
200
150
100
50
0
Comm
Nav
Surv
Data proc
Power
Pointing/input System mon
ATC functionalities
Figure 4-11 Distribution of major equipment failures according to ATC functionality
Further, the major failures of the communication functionality are mostly due to the loss
of air ground communication or available frequencies and problems with data
exchange network (when used as a coordination channel). This is determined by
observing the frequency of equipment types that support the communication
functionality affected by a major failure. Using a similar approach, the frequency of
equipment types that support the surveillance functionality affected by a major failure is
determined. These are: air situational display and radar. Within the data processing
and distribution function, more than half of the major failures are due to one particular
piece of equipment, namely the Flight Data Processing System (FDPS). This particular
system handles flight plans, making them live through automatic events, manual
inputs, and transitions from one state to the other. This information is provided via the
air situational display or radar display (Table 4-12).
97
Chapter 4
Table 4-12 Summary of the five most affected equipment types from four datasets
ATC functionalities
Communication
Surveillance
Data processing and
distribution
Major failures
primary and secondary surveillance radar =
loss of radar coverage
flight data processing system (FDPS)
4.4.6 Duration of equipment failures

This section provides the distribution of equipment failures according to their duration.
As discussed previously in section 4.1.4, three categories are distinguished, namely
short period of time (order of magnitude in minutes), moderate period of time (order of
magnitude in minutes up to one hour), and substantial period of time (order of
magnitude in hours or days). This categorisation is informed by the characteristics of
the failure duration extracted from the Country D dataset as it is the only dataset which
has this information available. In general, the data shows that equipment failures could
last for a significant amount of time, i.e. the average duration being more than ten
hours (M=10.25h, SD=77.6h). This variable is measured from the first log of the event
until its final closure, which may have occurred some days later. This is the reason for
the significant spread of the duration variable around its mean. Data analysis revealed
that more than 600 failures lasted more than 24h. One particular failure of radar
telephone lines was particularly extreme in its duration as it was logged initially on
November 20, 2003 and closed on June 09, 2004, lasting more than six months.
Figure 4-12 shows the distribution of the failure duration according to the four
categories. It can be seen that the majority of failures last for less than one day, while
34.5 percent of equipment failures last up to 15 minutes (corresponding to short
durations). This particular category of equipment failures (short period of time) is
relevant to controller recovery. Equipment failures lasting up to 15 minutes require adhoc thinking, use of past experience, training, and existing recovery procedures to
select and implement an optimal recovery strategy for the relevant contextual
conditions. Moreover, short duration failures lend themselves to experiment of
controller recovery, as presented in Chapter 9. Equipment failures lasting from 15
minutes to one hour belong to moderate duration category. Available data shows that
approximately 26 percent of equipment failures belong to the moderate period of time
98
Chapter 4
category. The final duration category, substantial period of time, is further divided into
two additional sub-categories, failures that last up to one day and those that last longer
than a day. This is done to extract more information as about 40 percent of the
equipment failures belong to the substantial period of time category. The results of the
analysis suggest that eight percent of reported equipment failures in Country D lasted
more than one day. Further investigation of equipment types affected by failures lasting
more than one day revealed that the majority of these are data exchange network
problems, air situational display, flight data processing system, links with radar sites,
and air ground communication.
3,000
2,500
Frequency
2,000
1,500
34.51%
31.6%
25.85%
1,000
500
8.04%
0
[0.00-0.25]
[0.26-1]
[1.01-24]
[>24.01]
Duration category (h)
Figure 4-12 Distribution of the failure duration according to four distinct categories
Since this research addresses controller recovery from ATC equipment failures, the
focus is on major failures within the short period of time category. Table 4-13
presents the distribution of the major failures lasting up to 15 minutes, according to the
ATC equipment affected. It can be seen that the equipment most affected is the data
exchange network, followed by the other surveillance systems (mostly refers to radar
link), flight data processing system, air situational display, and air ground
communication.
Table 4-13 Distribution of major failures lasting up to 15 minutes per ATC equipment affected
other surveillance systems
flight data processing system
99
Percentage
28
16
13.7
Chapter 4

12
7.4
4.4.7 Additional statistical tests

After the summary statistics presented for each of the datasets available and for four
relevant variables (ATC functionality, complexity of failure type, severity, and duration),
the final step is to test any interactions that may exist between these variables. The
ATC functionality variable is used because it has only nine categories, compared to the
ATC equipment variable which has more than 60 different categories. The rationale
behind the choice of statistical tests performed is explained in section 4.4.1. The results
are presented in Table 4-14.
Table 4-14 Statistical tests and results obtained

Country
Variable 1
Variable 2
Test
Country A
Country B
p<0.001
ATC
functionality
Severity
Non-parametric test
(Cramer's V)
Country C
p<0.001
p<0.001
ATC
functionality
Country D
Statistical significance at 95
percent confidence level
Severity
ATC
functionality
as above
p<0.001
as above
p<0.001
Non-parametric test
(Kendalls tau)
p=0.021
Duration
Severity
All statistical tests revealed significant relationships. For all available datasets there is a
significant relationship between the type of ATC functionality affected and the
equipment failure severity rating. The main findings from these tests indicate the
dominance of equipment failures affecting the communication and surveillance
functionalities with both minimal and major impact (see Table 4-15). The last test,
namely the relationship between failure severity and duration for Country Ds dataset
indicates significant negative relationship. In other words, the data indicates that the
longer the failure, the less severe it tends to be. This finding is expected as more
severe failures tend to be attended to immediately and thus the time between the first
log and closure of these failures may be shorter.
100
Chapter 4
Table 4-15 Main findings regarding interactions between ATC functionality and severity
Severity rating
Country
Country A
Country B
Country C
Country D
Major
surveillance
communication
communication and surveillance
Minimal
communication
communication and navigation
navigation
communication and surveillance
After qualitative and quantitative assessment of the equipment failures in ATC, the next
section derives a framework of the equipment failure impact assessment tool. This tool
is designed to assess equipment failures and provide an indication of their severity or
overall impact on ATC operations.
4.5 Qualitative equipment failure impact assessment tool

The ATC functionality classification defined in Chapter 2 is used as a basis for the
framework of the qualitative equipment failure impact assessment tool, as designed in
this research. This tool takes into account the proposed classification as well as the
failure characteristics relevant to controller performance. Thus, all previously defined
equipment failure characteristics must be examined for their relevance to ATC
operations. Table 4-16 provides the list of equipment failure characteristics relevant to
this tool. These are the type of ATC functionality provided by the failing system,
complexity of failure type, time course of failure development, and duration of failure.
Table 4-16 Review of equipment failure characteristics with regard to their impact on ATC
operations
Equipment failure characteristics
Impact on ATC
operations
Comment
ATC functionality affected
To be considered
Complexity of failure type
To be considered
Time course of failure development
To be considered
Duration of failure
To be considered
Impact on operational room
Output
Impact on ATC operations (severity)
Output
Impact on ATM operations

(capacity, delays)
Not relevant within the scope

of this research
The inclusion of all failure characteristics in this tool except ATC functionality affected
is relatively straightforward. When including the characteristic time course of failure
development, out of three possible categories (i.e. sudden, gradual, and latent) the
category latent was omitted. The reason for this lies in the fact that latent failures tend
101
Chapter 4
to be overlooked in the overall ATC system for long periods of time until triggered by
some other failure. As such, they have a profound effect on the controller, but only
once they are triggered by other failure.
The ATC functionality affected represents the key failure characteristics in terms of
effect on controller performance. It is significantly different if the controller is left to
operate without some key functionality (e.g. radar picture, communication, power
supply) as opposed to some auxiliary tools or equipment (e.g. monitoring tool, headset,
mouse). Therefore, it is necessary to separate ATC functionalities according to their
importance for the radar control of air traffic in a dedicated airspace. The separation is
intended to simply differentiate between primary and secondary ATC functionalities.
Their precise definitions informed by various examples are given in the following
paragraphs and Table 4-17.
Primary ATC functionalities are considered primary tools for achieving safe and
efficient flow of air traffic in any dedicated airspace. This group consists of the key
components, equipment, or tools of the communication, navigation, surveillance, data
processing, and power supply functionalities. These ATC functionalities are
categorised as primary ATC functions because they provide the critical information to
the controller. This critical information consists of: voice (and data) communication with
the aircraft in a dedicated airspace, aircraft horizontal and vertical position relative to
other traffic, and navigational directions or vectors to comply with the requirements of
the flight plan. These data are presented to the controller via an operational display
used for tracking the progress of multiple aircraft at any given moment. In modern ATC
Centres,
the
communication
function
is
provided
via
the
Voice
Switching
Communication System (VSCS) touch panel (see Chapter 2 for more details). In
addition, it is necessary to highlight that the power functionality also represents a
primary function. This is a direct consequence of the computer driven ATC environment
where electrical power supplies all of the above mentioned systems. Therefore, in case
of any disruption (either from public utilities or an ATC Centre's own installation), the
controller may lose some or all primary functionalities. Table 4-17 captures the primary
ATC functionalities.
Secondary ATC functionalities (Table 4-17) represent supporting tools to achieve the
primary objective of the ATC service. Their function is important but not irreplaceable
by other, primary ATC functionalities. This group consists of: input/pointing devices,
system monitoring, safety nets, supporting ATC tools, as well as various components
102
Chapter 4
of the communication, navigation, surveillance, and data processing functionalities. For

example, STCA, as a safety net, gained popularity out the past few years because of
its increased safety application and as a last ground-based technical defence against
mid-air collisions. Its sole purpose is to alert the controller to unsafe projected proximity
of two or more aircraft. Therefore, this system cannot be considered a primary function
in ATC but more of a supportive one. Furthermore, ATC tools, such as arrival and
departure managers, help sequence takeoff and landing of aircraft to provide the most
efficient utilisation of available resources (i.e. runway and airspace capacity). Overall,
without these tools, the controller may still provide the same functionality with
potentially less efficiency and increased workload.
Table 4-17 Detailed overview of the primary and the secondary group of ATC functionalities
ATC
functionality
group
Communication
Navigation
Primary
Sub-functionalities
(equipment, sub-systems, tools)
ATC functionality
Surveillance
Data processing
Power supply
Communication
Secondary
Navigation
Air-ground
Ground-ground
Voice Switching Communication System
Instrument Landing System (ILS) (during approach
phase and in the case of reduced visibility)
Primary Surveillance Radar
Secondary Surveillance Radar
Parallel Approach Runway Monitor
Terminal Approach Radar
Precision Approach Radar
Air Situational Display
Flight Data Processing System
Radar Data Processing System
Main power system
Uninterruptible power supply(generator, battery)
Data exchange network
Back-up system
Aeronautical Information Service
Other
Navigational aids (e.g. Very high frequency
Omnidirectional Range - VOR, Distance Measuring
Equipment - DME)
Airport facilities control and monitor (navigation aids
monitoring, aeronautical ground lighting)
103
Chapter 4
Surveillance
Surface Movement radar

Automatic Dependent Surveillance
Aerodrome Traffic Monitor
Other (radar link, radar console)
Auxiliary Display
Data processing
Flow control supporting equipment

Fallback facility
Other (e.g. strip printer)
Supporting function
(ATC tools)
Safety nets
Monitoring aids
Sequencing manager
Other
Short Term Conflict Alert
Minimum Safe Altitude Warning
Area Proximity Warning
Runway Incursion Monitoring and Conflict Alert
System
Pointing and input

devices
Pointing devices
Input devices
System monitoring
Data recording and playback facility

Control and monitoring
Degraded modes
Time management
Based on the selected characteristics of ATC equipment failures, it is possible to rate

the severity of each possible combination of characteristics. The three-level severity
rating defined previously, based on the impact of equipment failure on ATC operations,
has been used. This severity rating differentiates between major, moderate, and
minimal impact, as defined in section 4.2.3. In general, Figure 4-13 presents the
equipment failure impact assessment tool as a four-step methodology to assess the
severity of an equipment failure. After determining the exact characteristics of
equipment failure in each step, it is possible to follow the link to the final outcome, i.e.
severity rating.
104
Chapter 4
Figure 4-13 Qualitative equipment failure impact assessment tool
The output of this tool is an assessment of the overall impact of an equipment failure
on ATC operations and consequently controller performance. The rationale behind the
severity ratings presented in Figure 4-13 is as follows:
Loss of primary functionality tends to have moderate to major severity, depending
on other equipment failure characteristics (e.g. complexity of failure type) and
relevant contextual conditions (e.g. traffic). Moderate to major severity rating is due
to the fact that the primary ATC functionalities represent the critical tools for
achieving a safe and efficient flow of air traffic in any airspace.
Loss of secondary functions tends to have minor to moderate severity, depending
on the additional variables such as complexity of failure type, time course of failure
development, and duration. Minor to moderate severity rating is due to the fact that
the secondary ATC functionalities only provide assistance for more efficient air
traffic control, but do not represent the systems without which the control of the air
traffic flow becomes unfeasible.
Multiple failure occurrences may have a more severe impact on ATC operations
than a single failure occurrence simply because controllers have to cope with more
than one failure simultaneously.
Gradual failures (e.g. gradual loss of data integrity) may have a more severe impact
on ATC operations than sudden failures (e.g. sudden loss of data).
Duration of failure and severity rating tends to be inversely proportional. Data
analysis indicates that the longer the failure duration, the less severe it tends to
affect ATC operations and controller performance. The rational behind is that more
105
Chapter 4
severe failures tend to be attended to immediately and repaired in a shorter time.

Moreover, if it is known that a certain primary functionality will not be available for a
considerable amount of time an ATC Centre may impose strict flow restrictions. For
example, strict flow restrictions may be imposed in the event of total failure of the
surveillance function (loss of primary and secondary radar). Partial failure would
allow traffic but at a restrictive flow rate (loss of secondary radar). Even if a
prolonged failure affects secondary ATC functionality (e.g. strip printer), the
controller working position will have to be closed. This is due to the disruption
caused by replacement of a previously automated task with manual input of flight
information for each flight entering a dedicated airspace. As a result, it seems that
the most severe impact can be expected mainly from short to medium duration
failures.
The emphasis of this research is on equipment failures which may have a major impact
on ATC operations, including an air traffic controller performance. Therefore, the output
of the qualitative equipment failure impact assessment tool in Figure 4-13 is useful for
selecting potential equipment failures of relevance to the research on controller
recovery (used to inform the experimental design in Chapter 9).
Considering this, the qualitative tool could be used in an operational environment in two
ways. Firstly, the left-to-right approach allows investigation of past equipment failure
occurrences and their impact on ATC operations. Secondly, using the right-to-left
approach this qualitative tool can be used as a method for design of the most severe
training scenarios. The training instructors could easily adjust the set of primary ATC
functionalities to the taxonomy of their systems/equipment and the characteristics of
the ATC system architecture. The qualitative equipment failure impact assessment tool
may be used as a design tool for the regular refresher unusual/emergency situation
training as recommended by EUROCONTROL ASSIST scheme (EUROCONTROL,
2003f).
The main disadvantage of the qualitative equipment failure impact assessment tool is
its inability to simultaneously assess the impact of several independent failures on
controller performance; rather it assesses one failure at a time as well as common
cause and common mode failures through the complexity of failure category. However,
previous research has already highlighted that multiple failure occurrences create
the highest workload (Wickens et al., 1997). As such, the current version of the
qualitative equipment failure impact assessment tool is sufficient for selection of the
106
Chapter 4
most severe failure types, independent of each other. Future research should look into
the enhancement of this tool to enable the assessment of the impact of several
independent failures on controller performance. The output of this more advanced
approach would be to indicate the most severe independent multiple failure
combinations. However, to achieve this, the tool would have to be designed for a
specific ATC Centre to integrate the complexity of its ATC architecture and flow of data
between the various components of the ATC system.
4.6 Summary
In line with the objective of the research presented in this thesis, this Chapter has
identified potential equipment failure types and their key characteristics. Special
attention has been paid to the consequences of equipment failures and their impact on
ATC operations. A severity rating has been defined and applied to available operational
failure reports. The Chapter has further discussed technical recovery designed to
prevent or mitigate the impact of equipment failures on ATC operations and controller
performance.
Stepping away from theoretical findings from past literature, this Chapter has provided
operational input through the analyses of operational failure reports from four countries.
These analyses focused on four variables: the type of ATC functionality and equipment
affected by the failure, complexity of failure type, severity of its impact, and the overall
duration of the failure. Using the available reports it has been possible to identify
distributions of equipment failures in relation to these four variables. Although these
countries are different in terms of the volume and characteristics of airspace they
control, traffic levels, and equipment types; the analyses has shown that
communication and surveillance functionalities are affected most by equipment failures.
When observing only major failures, the most affected are the communication,
surveillance, data processing functionalities, and power supply. Further investigation of
major failures lasting a short period of time has revealed the most affected ATC
equipment. These are the data exchange network (as part of the communication
functionality), the flight data processing system (as part of the data processing
functionality), and air situational display (as part of the surveillance functionality).
The Chapter has concluded with development of a framework for the assessment of
the impact that every single equipment failure has on ATC operations. In general, the
knowledge acquired from equipment failure literature, informed by the analyses of
operational failure reports has been incorporated into the qualitative equipment failure
107
Chapter 4
impact assessment tool and its severity output. These will inform the choice of
equipment failure and its characteristics for the experiment designed to assess
controller recovery.
The safety-critical industry is aware of the fact that hazardous equipment failures
cannot be avoided and that absolute safety is not achievable. Thus, the same attention
given to their analysis should be given to the overall human recovery process. Kanse
(2004) points out that what we really want to prevent is not so much the failures
themselves, but the negative consequences of these failures. As a result, the following
Chapter gives appropriate attention to the controller recovery process.
108
Chapter 5
Air Traffic Controller Recovery
The previous Chapter explained the characteristics of equipment failures and the
notion of technical recovery. This Chapter reviews the associated issues of the process
of controller recovery. In Air Traffic Control (ATC), the human recovery process
involves two groups of individuals. One group consists of controllers and the other
consists of system control and monitoring engineers1. The Chapter starts with a brief
discussion of the roles controllers and engineers have in the recovery process. As the
focus of this thesis is on controller recovery from equipment failures, the Chapter
continues with a review of past research of relevance to this subject. In this respect, the
Chapter reviews in detail the phases of controller recovery and the corresponding
models developed for the Air Traffic Management (ATM) and non-ATM industries. This
is followed by a discussion of the major factors that influence the quality of controller
recovery. The Chapter concludes by proposing a set of variables used for a detailed
assessment of controller recovery performance later in this thesis. This set of recovery
variables is also used as a guide to the design of the experiment to capture real data
on controller recovery in Chapter 9.
5.1 Human recovery in air traffic control

The human recovery process in the ATC environment involves two distinct groups of
individuals. One group is represented by air traffic controllers and can consist of a
single controller or a team of controllers depending on the configuration of the ATC
Centre and the traffic levels at any given moment. Engineers from the system control
and monitoring unit belong to the second group. This section gives a brief description
of the role of each group and the specific tasks to be executed to recover from
equipment failures in ATC.
Referred to as engineers throughout the thesis.
109
Chapter 5
5.1.1 Recovery by air traffic controllers

In the case of any equipment failure that affects controller performance (referred to as
a hazard in this thesis), controllers are responsible for recovering the system and
achieving a safe but not necessarily efficient level of operation. There are many human
factors issues that affect controller performance under normal conditions, and it is
reasonable to assume that the same factors are even more critical under abnormal
conditions, such as equipment failures. In other words, the context in which controller
performance takes place is important in understanding controller reliability. A detailed
review of contextual factors that may influence controller recovery and a methodology
for their potential influence on controller performance are presented in Chapter 7 and 8,
respectively.
While a recovery procedure may exist or not in the event of an equipment failure, most
ATC Centres have developed procedures for reporting and resolving such failures. Any
equipment failure should be reported to the supervisor, whilst those with operational
and safety impact must be reported under the mandatory occurrence reporting scheme
(for details see Chapter 3). Details of the failure are also forwarded to the system
control and monitoring unit.
When a failure has been rectified, the system control and monitoring unit notifies the
supervisor that the equipment has been restored to service. Then it is the duty of the
supervisor to inform the relevant sector staff and ensure that the restored equipment is
functioning correctly before updating the status of the failure in the database. In the
event that the system control and monitoring unit identifies a failure occurring in the
operations room, it is the duty of this unit to inform the supervisor who will subsequently
informs the controllers.
5.1.2 Recovery by system control and monitoring engineers

Failures are not necessarily detected only by controllers. Due to the layers of built-in
defences that exist in modern ATC Centres, the majority of equipment failures do not
affect the controller (NATS, 2002). These failures are detected by the technical system
and resolved by engineers from the system control and monitoring unit (e.g. by
receiving a system-generated alert and using redundant equipment, respectively).
EUROCONTROL (2004e) refers to an ATC system control and monitoring unit as a
critical partner in maintaining ATC systems. Engineers monitor and control equipment
110
Chapter 5
that supports controllers. They reconfigure and maintain degraded or failed equipment
with minimum disruption to controller tasks and regularly upgrade the software as
operational requirements deem necessary. System control personnel have rapid and
reliable communication links with the ATC operations room via the supervisor. They
utilise this communication channel to inform ATC staff of the status and performance of
equipment and systems or to receive reports of technical problems and equipment
failures from the operations room. Therefore, EUROCONTROL (2004e) concludes that
recovering the ATC system from failure is a result of close coordination and
cooperation between controllers, technicians, and management.
Following this brief discussion of the roles and responsibilities of controllers and
engineers in the recovery process, the next section reviews the past research on the
human recovery process and its phases, developed for the Air Traffic Management
(ATM) and non-ATM industries. The main findings are then applied to a particular
process of controller recovery.
5.2 Phases of the controller recovery process

Existing literature on the human recovery process (either from human error or technical
failure) is largely based on the concept of a sequence of phases that constitute the
process of recovery. The human recovery process has become an important topic in
many areas of applied psychology, particularly in safety research in the chemical
industry (e.g. van der Schaaf, 1992; Kanse and van der Schaaf, 2000; and Kanse,
2004), the nuclear industry (Kaarstad and Ludvigsen, 2002), and the ATM industry
(Bove, 2002). Other examples include research on errors in the use of humancomputer interfaces (e.g. Kontogiannis, 1999; Rizzo, Ferrante, Bagnara, 1995; Zapf
and Reason, 1994), in the office environment (e.g. Frese, Broadbeck, Zapf, and
Prumper, 1990), in software design (Frese, 1991), and in the assessment of everyday
slips and mistakes (e.g. Sellen, 1994).
As can be seen from Table 5-1, there is consensus amongst researchers in various
domains to the existence of at least three phases of the human recovery process. A
few of the researchers, focused on the errors in the design of human-computer
interfaces, including a phase before the actual detection: the occurrence of an error
(Zapf and Reason, 1994) or the emergence of a mismatch (Rizzo et al., 1995), with the
latter being a precursor of the detection phase. The emergence of a mismatch involves
the discrepancy between feedback and active knowledge (active expectations or
implicit assumptions). Rizzo et al. (1995) discuss and explain the difference between
111
Chapter 5
mismatch and detection processes through several examples of human error.

Mismatch is considered as a breakdown of the action-perception loop. However, only
after actual detection of mismatch will it be understood as an error or a failure.
From the detection phase onwards, some phases, including diagnosis and correction,
are recognised by most researchers even though sometimes different terminology is
used (Table 5-1). For example, the diagnosis phase is often referred to as the
explanation, localisation, or identification phase. Similarly, the correction phase is often
referred to as the handling, planning and execution, recovery, or countermeasure
phase.
Table 5-1 Phases of the recovery process identified in past research

Author(s)
Context of research
Frese (1991)
Software design
Kontogiannis (1999)
Human Machine
Interface
Zapf and Reason (1994)
Human Machine
Interface
Rizzo, Ferrante, and

Bagnara (1995)
Human Machine
Interface
Sellen (1994)
Assessment of
everyday slips and
mistakes
van der Schaaf (1992)
Nuclear industry
Kanse (2004)
Chemical industry
Kaarstad and Ludvigsen

(2002)
Nuclear industry
Bove (2002)
ATM industry
Phases of the recovery process

Error detection
Error explanation
Error handling
Error detection
Error explanation or localisation
Error correction
Error occurrence
Error diagnosis (detection +
explanation)
Error recovery (planning + execution)
Mismatch emergence
Detection
Recovery
Error detection
Error identification
Error recovery
Detection
Localisation
Correction
Detection
Explanation
Countermeasures
Detection
Explanation
Correction
Detection
Correction
Therefore, in the research on recovery from equipment failures presented in this thesis,
past research is used to inform the phases of the controller recovery process.
Bove (2002) does not identify the diagnosis phase in the human error management process.
This may be due to the fact that this phase represents a covert human activity, difficult to
observe, measure, and capture in incident reports.
112
Chapter 5
Detection of equipment failure is taken as the first phase, triggered by the mismatch
between ATC system feedback and active knowledge of the controller (expectation or
assumption). This phase is followed by the diagnosis and correction, leading toward
the outcome of the recovery process (as a result of both technical and controller
recovery).
Controller recovery is defined in this thesis as the ability of the controller to detect3,
diagnose, and correct any non-nominal system state resulting from ATC equipment
failure (adapted from van der Schaaf, 1995). The objective of the recovery process (i.e.
its outcome) is to restore the system to its nominal (pre-failure) state or at least to limit
the consequences of failure in the most efficient and effective way (by achieving stable
non-nominal system state). The following sections discuss the phases of controller
recovery.
5.2.1 Detection
Human recovery is a sequential process whose first step is the detection of failure.
Without this detection there is no recovery process. Therefore, the first task of the
controller is to detect the failure. As previously explained, failures can be firstly
detected either by a technical system or by a controller. Hallbert and Meyer (1995) note
that to accomplish detection by the human operator, the stimulus must be
recognisable. In other words, the stimulus must be something that a controller has
already experienced, is trained to observe, or is of sufficient intensity to interrupt the
monitoring process (e.g. visual or auditory alert positioned within the field of view but
different from the background noise already present on the radar screen or other
operational support system).
Thus, detection is triggered by any mismatch between the expected effects and
observed outcomes. The mismatch can be explained on the basis of the information
that is matched against the frame of reference or range of the expected system
responses. For example, after issuing an instruction for a flight level change to an
aircraft, the controller expects to see the old flight level gradually changing toward the
new one. However, if the controller observes a flight level change outside the expected
Failures can be firstly detected either by a technical system or by a controller. Failures

detected by a technical system may trigger the generation of an alert (via warning device)
transmitting information on failure to the controller. However, failures can also go unnoticed by
the technical system and be detected by a controller working with fallible equipment.
113
Chapter 5
values, then this expectation will trigger the identification of some sort of fault. This
fault can be caused by an erroneous flight level change by the pilot or system readout
of the aircraft altitude (e.g. due to radar garbling).
In the case of a total failure of a particular function, it is easier to detect and diagnose
the significance of the change, since the failure is obvious. However, in the case of a
partial failure of a particular ATC function (e.g. corruption of tracks and squawks),
detection may be more challenging. In these circumstances, detection is based on the
controllers memory of aircrafts past positions and future trajectories, aided by
available tools (e.g. flight strips). An example of potential difficulties encountered by
controllers in detecting partial equipment failure is reported by Sampaio and Guerra
(2004). In this example, a sudden failure of the Radar Data Processing System (RDPS)
affected only one radar track and went unnoticed by the controller for 21 minutes (see
Chapter 4, section 4.2.1).
Detection is also closely connected to the time course of equipment failure
development, namely sudden, gradual, or latent failures (see Chapter 4, section 4.1.3).
Sudden failures do not allow any time to prepare, but are usually detected immediately.
On the other hand, detection of gradual failures may be extremely difficult and delayed.
Persistent (latent) failures are almost impossible to detect. They might exist in the ATC
system for a long period of time before they are detected. This is confirmed by
interviews conducted during this research with the aim of augmenting the theoretical
sources of information. Engineers from three European ATC Centres confirmed that
latent failures (mostly software failures) tend to go unnoticed until some other event or
failure reveals their existence (for evidence see Appendix II).
There are various other factors that can hinder failure detection, such as difficulties in
observing system feedback or remembering expectations about effects. Detection can
also be made difficult by inappropriate system design (e.g. poor human machine
interface, poor quality or position of alert), workplace layout, or controller working
strategy. As an example, an alert that is barely visible or audible may remain
undetected even by a highly alert controller.
Often, successful detection occurs as a consequence of a combination of design
qualities and mental resources. An example is taken from one of the European ATC
Centres where the label of the ATC function positioned in the general information
window changes its colour from white to yellow in the case of a failure. However, in the
114
Chapter 5
training facility of the same ATC Centre, within the same window, one specific label is
designed to be colour-coded yellow regardless of its status (i.e. label Lines refers to
the status of the communication lines between a number of ATC Centres). Such a
training platform design feature has the potential to result in the missed detection of a
failure by a controller as a result of a continuous and consistent presence of the yellow
colour in the general information window.
Besides the quality of an alert, its onset also plays an important role. As previously
discussed in Chapter 4, alert onset (i.e. Time-To-Alert or TTA) is defined as time
between a systems detection of a failure and the moment an alert is presented on the
Human Machine Interface (HMI) either by colour change or text message. More
importantly, the future concept of cognitively convenient alarm onset aims to
circumvent these human limitations by providing an alert, for the system-detected
failure occurrence, at the moment when levels of controller workload allow its detection
(see Chapter 4, section 4.3.2).
The above discussions have highlighted that detection can be either enhanced or
hindered by a combination of technical and human related factors. External stimulus,
past experience, appropriate design solutions, and sudden development of equipment
failures tend to enhance detection. However, inappropriate system design, high levels
of workload and fatigue may hinder failure detection. Similar conclusions are drawn
from the study on human recovery performance in nuclear power plants by Kaarstad
and Ludvigsen (2002). Based on a literature review, an experimental investigation, and
field studies, they identify the three most significant factors that affect the detection
phase. These are:
communication - interaction with colleagues can provide information to detect a
failure;
system feedback - cues directly found in the operational environment (e.g. alerts,
other non-usual system event); and
internal
feedback
mismatch
between
operators
expectations
of
system/environment and the existing system status.

All above mentioned factors are relevant within the ATC environment. For example,
communication represents an important factor as the information on an equipment
failure can come from the supervisor or the system control and monitoring unit.
Similarly, in the ATC environment internal feedback is referred to as mental model.
Once the controller is aware of information mismatch, his or her task is to rapidly
115
Chapter 5
determine the significance of that mismatch. Generally, the existing system output is
compared with the previously observed one, to determine whether the change is within
tolerance. For example, if an aircraft is in level flight no flight level change should occur
and any deviation from the cleared flight level should trigger the detection of an
unusual event (e.g. pilot error, radar garbling).
The detection phase is investigated further using data from a questionnaire survey and
an experiment in Chapters 6 and 10 respectively.
5.2.2 Diagnosis
Once detection occurs, the diagnosis phase (also known as explanation, localisation,
or identification phase) determines what the failure is, its cause, and what should be
done to correct it. A controller needs a good knowledge of a failure to determine what is
occurring and its effects (e.g. what to expect in the near future, whether the function is
still partially available or totally lost, any problem with data integrity and possible impact
on other tools). This is especially important in the ATC environment where the overall
system consists of highly integrated components and different failures may present
themselves to the controller in a similar manner. For example, a radio frequency failure
manifests itself in the same manner regardless of its cause (i.e. ground- vs. airbornebased failure). Therefore, it is up to the controller to identify the true failure by ruling out
alternatives. In this particular example, the controller will first try to establish radio
contact with other aircraft. If communication is established with the other aircraft it is
reasonable to assume that the failure is on the aircraft side. The controller will then try
to identify if it is a receiver or a transmitter failure by asking the aircraft to squawk
identification. If the aircraft squawks identification then the pilot clearly heard the
transmission. The controller then knows that the aircraft has experienced a transmitter
failure. By employing this procedure, the controller determines the precise element of
the equipment that failed, and thus implements the most appropriate recovery
procedure.
Past research in non-ATM industries has shown that in some cases, after the detection
of a failure, the corrective actions are immediately known and implemented. In these
cases, the diagnosis phase is omitted (e.g. in the nuclear industry - Kaarstad and
Ludvigsen, 2002). Similarly, the study from the chemical process industry has shown
that the order of the phases is not always the same. More precisely, the diagnosis
phase does not necessarily follow the detection phase, especially in time-critical
116
Chapter 5
operations. Often a quick fix might be necessary or an initial correction might occur
even before the cause of a failure has been identified (Kanse, 2004).
The findings from non-ATM industries are not entirely applicable to the ATC/ATM
environment. It is difficult to see how the diagnosis phase could be omitted simply
because proper ATC equipment failure recovery is not possible without knowing the
true nature of a failure. However, the duration and the attention dedicated to the
diagnosis phase relates directly to the level of workload experienced by the controller
at the moment of failure occurrence and during the recovery process. Through
interviews, EUROCONTROL study determined that controllers in most occasions do
not seek an explanation for a cause of failure (EUROCONTROL, 2004e). They focus
only on identifying the system that failed, which is essential to implement an adequate
recovery strategy. An example could be the code-callsign conversion failure, where,
having detected a problem, the controller has to identify the pair of aircraft affected.
This tends to be a very time-consuming process leaving no time for the controller to
consider the cause of the failure. Another example is corruption of radar data. If the
controller doubts the quality of a particular radar source in the multi-radar coverage
airspace, it is possible to use information from other radar sources. If the same failure
occurs in the single-radar coverage airspace, the controller has to disregard radar data,
initiate procedural (non-radar) control, and pass the problem to the system control and
monitoring unit. In both cases, the controller has to determine what failed and what the
impact of that failure is, in order to implement an adequate recovery strategy. The
cause of the failure is left to the system control and monitoring unit to investigate.
From the discussion above, it is clear that the diagnosis phase is important to identify
the equipment that has failed. However, if the failure is identified and corrective actions
are immediately known, diagnosis is omitted for the subsequent correction phase. The
diagnosis phase and the factors that may influence it are addressed further in Chapter
10 on an experimental investigation. Once the controller diagnoses the failure type and
its impact on the ATC system, the tasks shift to more action-based activities. In short,
the controller initiates the correction phase which is described below.
5.2.3 Correction
Failure recovery involves knowing how to undo or minimise the effect of failure and
achieve the desired system state (nominal or stable non-nominal system state,
respectively). The first priority is to minimise the effect on the air navigation service and
the exposure of the problem in terms of aircraft and time. Depending upon the
117
Chapter 5
equipment failure type, recovery should follow available procedures (for details see
section 5.5). Some of them could be fairly simple like switching to another radar source
in multi-radar processing areas, changing to the secondary radio frequency (if the
primary one is blocked), changing unserviceable input devices (mouse or keyboard),
and switching to another console (if the current one is not operational). Other recovery
strategies could be very complex and both physically and mentally demanding. For
example, if an automated conflict detection tool fails to work properly (e.g. Short-Term
Conflict Alert STCA and Medium Term Conflict Detection - MTCD), an alert might
appear when there is no failure, or conversely the controller might detect a conflict that
was not alerted automatically. In both instances, the controller will diagnose that the
conflict detection tool itself is not functioning properly. Immediate action would be
required to ensure the safety of all traffic. In other words, the controller will have to
detect all existing conflicts and resolve them in a timely and efficient manner without
the assistance of automated safety nets (e.g. STCA). The second priority would be to
test and restore the automated function, which would be the responsibility of the
system control and monitoring unit.
Past research in the nuclear industry has identified different types of decision events
that constitute the correction phase of recovery (Orsanu and Fischer, 1997; Kaarstad
and Ludvigsen, 2002). These are assessed for the ATC environment below:
ignoring the failure error/failure has been detected, but ignored by the operator for
two possible reasons: error/failure is considered irrelevant (i.e. no impact on
operations) or the operator assumes that his/her intervention may make the
situation worse. In any case the failure would have to be reported;
applying procedures this seems to be the most common correction type.
Therefore, it is necessary to ensure that procedures exist and that they are
appropriate to a particular failure;
choosing a solution in theory this is applicable when procedures are not available
and the human operator has to apply more conscious resources to comprehend the
situation. In many situations it may seem that only one solution is possible to
resolve the failure. However, in retrospect, more than one solution may be
available, while only one was considered at the time; and
creating a solution in this case the operator has no experience with the failure
type. No procedures, training, or past experience are available for the human
operator to draw upon. A completely new solution or strategy has to be created.
118
Chapter 5
This represents the most resource-demanding option of all. This process

corresponds to human heuristic competence4 (Rigas and Elg, 1997).
In the context of ATC, if the failure penetrates all existing built-in defences and affects
controller performance, it cannot be ignored. Thus, the recovery from ATC equipment
failures can be accomplished by applying a predefined procedure, modifying an
existing plan, or developing a new one. However, application of an existing procedure
would be the preferred option as it puts the least strain upon the controller. Compared
to the nuclear environment, the execution of the chosen procedure has to be done in a
very short time frame (EUROCONTROL, 2004e). An important aspect of the correction
phase and recovery is coping with stress induced by unexpected failure. Interviews
with controllers conducted for the EUROCONTROL study confirmed that unexpected
failures tend to significantly increase workload and stress (EUROCONTROL, 2004e).
Controllers are unable to perform their tasks effectively with a large reduction of the
ability to cope with other adverse operational and environmental conditions.
Furthermore, the controllers interviewed highlighted that the critical incident stress
management is essential in managing the stress associated with equipment failures
(EUROCONTROL, 2004e).
The correction phase and the factors that may influence it are investigated further in
Chapter 6 and 10. From the discussions above, it is clear that existing recovery
procedures, recovery training, and past experience with equipment failures play an
important role in the overall recovery process. These three drivers build a knowledge
base for the choice or creation of the most appropriate solution for recovery from an
equipment failure. The discussion above, of the phases that constitute the process of
recovery, is followed in the next section by looking at the outcome of the recovery
process.
5.3 Outcome of the recovery process

Although the main recovery process consists of several phases, as explained
previously, these activities do not conclude the process itself (Figure 5-1). Prior to the
4
There are two types of human competences: epistemic and heuristic. Epistemic competence
refers to domain knowledge about the system which one seeks to control. It is context
dependent component of the actual competence. Heuristic competence refers to a general
competence for handling complex dynamic tasks. It is context independent, but it is developed
over many years through both training and experience. As a result, actions and decisions
become fast, automatic, without apparent conscious awareness.
119
Chapter 5
outcome phase, the human operator attempts to resolve the problem, by implementing
a recovery strategy. This is followed in the outcome phase by post-correction
monitoring or post-recovery analysis to determine the actual outcome of the
implemented strategy. Therefore, the first task in this phase is the monitoring itself,
both by controllers and engineers. Proper design solutions could aid this phase by
providing post-recovery system status indicators.
EQUIPMENT
FAILURE
HAZARD
RECOVERY
OUTCOME
RECOVERY
SUCESSFUL
RECOVERY NOT
SUCCESSFUL
RECOVERY
CONTINUES
INCIDENT WITH
FURTHER
CONSEQUENCES
Figure 5-1 Analysis of the outcome phase (adapted from EUROCONTROL, 2004e)
It might be expected that at this stage human performance requirements are similar to
those of the detection phase. However, as observed by EUROCONTROL (2004e)
there is a crucial difference. Guided by implemented corrections (recovery strategies),
monitoring by both engineers and controllers is driven more by top-down processes,
primarily expectation. Since at this stage in the recovery process the operators have
knowledge of the failure and its cause, they also have expectations on how the system
might behave after a correction is implemented. For instance, if the system remains
unstable, operators may expect a reoccurrence of the same problem, other related
problems (common-mode or common-cause failures), or have a general suspicion that
the assessment of the problem was wrong or misleading.
Following the period of monitoring or active checks, the controller must decide whether
recovery is successful. Recovery is considered successful if the system returns to the
nominal (pre-failure) or intermediate, stable state (EUROCONTROL, 2004e).
Intermediate state represents a degraded operational state (e.g. loss of any function,
item of equipment, or a significant overload condition causing increased system
response time) which is detected and stabilised either by controllers or engineers. In
essence, the system is in the intermediate state if the consequences of failure are still
observable in the system performance while controllers are aware of the quality of
120
Chapter 5
information they are receiving from the system and thus the quality of service they can
provide to traffic.
If recovery is unsuccessful, the controller will return to either diagnosis (to determine
the real cause of the problem) or correction phase to retry the previous strategy or
attempt a new one (Kanse and van der Schaaf, 2000; EUROCONTROL, 2004e). This
cycle of reapplied efforts continues as long as there is the time available for recovery.
Otherwise, if no time is available, the final outcome may be an incident with further
consequences (e.g. loss of separation).
The next section reviews the existing models of failure and recovery process
developed to support the research on human recovery in ATM and non-ATM industries.
5.4 Models of human recovery

Throughout the reviewed literature, only a few models cover both equipment failure and
its recovery process. On the other hand, an extensive volume of research is dedicated
to models of recovery from human error. These models are the result of work in the
field of human reliability and can be transferred to recovery from equipment failure. In
chronological order, the review begins with the work of Frese et al. (1990) and Frese
(1991), which was based on office workers errors and error handling in using
computers. In 1992, as part of a PhD thesis on near miss reporting in the chemical
process industry, van der Schaaf (1992) developed the Eindhoven classification model
of system failures. This model was based on Rasmussens Skill-Rule-Knowledge
(SRK) model of human behaviour (Rasmussen, 1982) as one of the most dominant
factors causing system failures in chemical process plants. The SRK model of human
behaviour was extended to system failures, incorporating additional root causes of
incidents, namely technical and organisational factors. The incorporation of all relevant
failure factors has created a comprehensive approach to safety management.
However, the approach has suffered from the limitations of the SRK model as
discussed below.
Bainbrigde (1984) reports problems using Rasmussens taxonomy of three main types
of cognitive behaviour, namely SRK. For example, the word rule could be used for a
specific procedure, instructions, standard method based on previous experience, or
precise heuristic method. Another criticism is of the associated model for organisation
of cognitive behaviour, the so-called Rasmussens pyramid model. The model places
skilled behaviour at the base and knowledge based behaviour at the top of the
121
Chapter 5
pyramid. This model, although representing the general organisation of cognitive

behaviour, does not contain mechanisms for complex behaviour (see Bainbridge,
1984).
While the previous discussions focus mainly on models for recovering from human
error, this section further presents three models that focus on recovery from technical
failures. These are: the model by Kanse (2004) developed and tested in the chemical
process industry; the EUROCONTROLs project on Solutions for Human Automation
Partnership in European ATM (SHAPE) and the Recovery from Automation Failure
Tool (RAFT) developed specifically for the Air Traffic Management (ATM) industry
(EUROCONTROL, 2004e); and the model of failure recovery in air traffic control by
Wickens et al. (1998). The model by Kanse originates in non-ATM industry but focuses
not only on the human as a system component, but equipment and procedures as well.
This model lays down the ideas for the RAFT. The RAFT and the Wickens models
were chosen because of their relevance to research in this thesis as both assess the
impact of future automation on recovery from potential failures.
5.4.1 Model by Kanse

The basic principle behind the model by Kanse (2004) is a sequence of phases that
constitute the process of human recovery, detection, explanation (i.e. diagnosis), and
countermeasures (i.e. correction). The model is based on past research and
operational data from three studies of near misses in chemical process plants. Near
misses are incidents that have the potential to, but do not result in a loss (e.g. an
accident, injury, failure).
According to this qualitative phase model (Figure 5-2) the recovery process starts by
detection of a failure. This is followed by any combination of explanation (referred to as
diagnosis in this thesis) and countermeasures (referred to as correction in this thesis),
including omitting one or both of these phases but also their recurrences. For example,
the assessment of the order of the recovery steps performed by plant operators in each
incident revealed that the intermediate phase (i.e. diagnosis) was omitted in more than
35 percent of incidents (see Table 3 in Kanse, 2004).
The model does not focus on the factors that influence the recovery process but
highlights that factors influencing recovery might be different in different domains.
Additionally, the model does not make any attempts toward the prediction of human
performance, future errors, or failures.
122
Chapter 5

E
Explanation of
deviation and
causes
BEGIN
Problem situation
arises as a result of
one or more failures
D
Detection of
deviation
C
Countermeasures
END
Of recovery
process
Figure 5-2 Recovery process phase model (Kanse, 2004)
5.4.2 The RAFT Tool

The EUROCONTROLs SHAPE project addressed the effects of automation on human
performance and future ATM concepts. A part of this project focused on the technical
failures and the controllers ability to manage them and resulted in the Recovery from
Automation Failure Tool (RAFT), as a method for analysing technical failures.
The basic principle behind RAFT is a sequence of phases that constitute the process of
failure and recovery (Figure 5-3). Following a number of important factors that influence
the consequences of an equipment failure, the RAFT tool starts by assessing the
recovery context that has the potential to influence human recovery process (Figure 53). This is followed by an assessment of the failure cause, problem definition
(according to the RAFT framework an equipment failure leads to a functional
disturbance), and the failure effects. Then, the RAFT tool moves toward the
investigation of the human recovery process. This is done separately for the controllers
and engineers involved. The final step in the failure analysis is the outcome phase
and includes an assessment of the effectiveness of the implemented recovery strategy
(Figure 5-3).
The RAFT is based on the past research and operational experience. It is based on a
qualitative model developed by Kanse and van der Schaaf (2000) for the chemical
process industry (further adapted by Kanse, 2004 as explained in the previous section).
The model by Kanse and van der Schaaf is further augmented with operational
experience, extracted from interviews with 31 ATM staff in four European ATC Centres.
The practical use of the RAFT is based on the existence of expert group-based
evaluation of each failure and prediction of how controllers are likely to respond to
equipment failures. This tool is intended to be used together with other SHAPE project
outputs for predicting controller performance in the future highly automated
123
Chapter 5
environment (e.g. a prediction of changes in controller skill requirements, workload,

trust). The approach has neither been verified through the recovery performance in
simulated nor operational environments and still lacks the set of recovery relevant
principles to guide designers of current and future ATM systems. Second generation
prospective Human Reliability Assessment (HRA) methods could be used to develop a
predictive capability of the RAFT tool and to inform safety-adequate design principles
related to controller recovery from equipment failures.
Figure 5-3 The Recovery from Automation Failure Tool Framework (EUROCONTROL, 2004e)
5.4.3 Model by Wickens et al.

In 1998, the Panel on Human Factors in Air Traffic Control Automation established by
the Federal Aviation Administration (FAA) studied various aspects of human factors
and the role of the human in proposed future automated systems. Amongst several
different issues, research by this Panel recognised the importance of equipment
failures and recovery. The Panel proposes a model of ATC failure recovery and places
an emphasis on the consequences of degradation of automated ATC functionalities
(Wickens et al., 1998). It is assumed that the model is based entirely on available
research as the Panel focused on concepts that will characterise the future ATC
system. The basic principle behind this qualitative model is the impact of ATC
automation functionalities (left-hand side on Figure 5-4) on capacity, traffic density,
complexity, workload, situational awareness, manual skills, and recovery response
time. Each of these variables is associated with a sign (or a set of signs) indicating
124
Chapter 5
whether automation is likely to increase or decrease the variable in question. However,

this model does not consider in detail how recovery is accomplished.
Figure 5-4 Model of failure recovery in air traffic control. Where two nodes are connected by an
arrow, signs (+, -, 0) indicate the direction of effect on the variable depicted in the right node,
caused by an increase in the variable depicted in the left node (Wickens et al., 1998)
The model also reflects the hypothetical function which relates recovery response time
to the level of automation (Figure 5-4). It is expected that recovery response time will
increase as the level of automation increases (shown as a dashed upward line on the
right side of the Figure 5-4), due to increased complexity, skill degradation, and overall
out of the loop phenomenon. The solid downward line reflects the decrease of the
reaction time available to controllers as a result of the introduction of higher levels of
automation. Controllers will have far less time to safely respond to any loss of
separation and fewer opportunities for effective solutions. As a result, this model
represents the Bainbridges (1983) ironies of automation by overlaying two critical time
variables against each other and as a function of automation-related changes. These
variables are: the time required to establish safe separation, given a degraded ATC
service, and the time available to a controller (or a team) to react and safely recover
from a failure.
After describing the three models relevant to controller recovery from equipment failure
in ATC, Table 5-2 summarises their characteristics and identifies their limitations
addressed later in the thesis. In general, all three models are qualitative and based on
a principle of a sequence of phases that constitute the process of human recovery.
125
Chapter 5
They are based on past research, whilst only one model is based on operational data.
The limitations identified in the last column of Table 5-2, guided the research presented
in this thesis and the main principles behind the framework for the assessment of
controller recovery. In short, the research in this thesis is verified in the simulated
environment (experimental investigation Chapter 10), based on operational
experience (from interviews with relevant ATM staff, operational data Chapter 4, and
the questionnaire survey - Chapter 6), and based upon detailed assessment of the
recovery context (Chapters 7 and 8).
Table 5-2 Summary of relevant models of the human recovery process

Model
Context
Operational
input
Assessment
of recovery
Kanse
(2004)
Chemical
industry
Yes
(interviews
and data)
Qualitative
and
quantitative
No
SHAPEs
RAFT
tool
ATM
Yes
(interviews)
Qualitative
(expertbased)
Qualitative
(expert-based)
No
Qualitative and
potentially
quantitative
(based on the
recovery
reaction time)
Wickens
et al.
(1998)
ATM
No
Prediction of
recovery
Limitations
No assessments of
the recovery context
No prediction of the
recovery process
Not verified in
simulated/operational
environment
Based only on
interviews and no
operational reports
Theoretical approach
As stated previously, there are three major factors that influence the quality of
controller recovery, i.e. past experience, procedures, and training. Whilst procedures
and training are regulated within the aviation community, operational experience is
accumulated over time and controllers may or may not experience equipment failures
during their career. For this reason, the next sections describe and discuss existing
regulations regarding recovery procedures and training. Operational experience,
extracted from the questionnaire survey, is investigated in the following Chapter.
5.5 Procedures for handling ATC equipment failures

In both the literature and operational practice, procedures are recognised as the critical
factor for effective recovery. The following section provides an overview of the existing
international and national regulations on procedures for recovery from equipment
failures in ATC. This is followed by a discussion on key principles on the recovery
procedures in ATC, identified in this research.
126
Chapter 5

Regulation on procedures for handling ATC equipment failures, i.e. recovery
procedures, exists at three levels. These are: international (i.e. by the International Civil
Aviation Organisation - ICAO), regional or national (e.g. by the European Organisation
for Safety of Air Navigation EUROCONTROL at the regional level and Civil Aviation
Authorities CAAs at the national level), and air navigation service providers (ANSPs)
level.
The main activity of ICAO is the establishment of International Standards,
Recommended Practices and Procedures covering all technical fields of aviation. The
Recommended Practices are desirable objectives to which ICAO member states
should aim (but are not required) to conform with; whilst Standards are considered
mandatory or required in the interest of safety of international air navigation (FAA,
2005). ICAO Standards and Recommended Practices are passed to the respective
regional organisation (e.g. EUROCONTROL) or directly to the national CAAs for
assessment and implementation. The national CAA is then responsible for assurance
and monitoring that these standards are properly implemented by ANSPs at the level of
ATC Centres. The current status of regulations on recovery procedures is discussed in
the following sections.
Since 1945 ICAO has specified the standards, practices, and procedures for ATC. The
most recent edition of ICAO Annex 11 responsible for air traffic services (ICAO, 2001c)
advises that air traffic services authorities should develop and promulgate contingency
plans for implementation in the event of disruption or potential disruption of air traffic
services and related supporting services in the airspace for which they are responsible
for the provision of such services. This ICAO recommendation represents a summary
of the key system safety principles that need to be considered within each air traffic
service unit. Moreover, several particular equipment failures are covered separately in
the ICAO document dealing with procedures for air navigation service (ICAO, 2001a).
These are radar equipment failure, ground radio failure (blocked frequency), ground
Automatic Dependent Surveillance (ADS), and failure of Controller Pilot Data Link
Communication (CPDLC). Based upon the findings from the analysis of operational
failure reports presented in Chapter 4, ICAO has concentrated upon the appropriate
components in terms of the communication and surveillance ATC functionalities whilst
disregarding the data processing functionality.
127
Chapter 5
In their guidance for recovery from four failure types, ICAO recommends necessary
steps to be taken by controllers and pilots, as well as ATC Centre watch managers or
supervisors. When necessary, ICAO also recommends collaboration with adjacent ATC
units. Therefore, the recovery process is not seen only as the responsibility of
controllers but all parties involved within the affected ATC Centre and region (including
the adjacent ATC unit which can provide valuable assistance in restricting or rerouting
the flow of traffic). All other failure types are left to national service providers to include
and define in their Manuals of Air Traffic Services (MATS).
At European level, EUROCONTROL published guidance and recommendations for
controller training in the handling of unusual/emergency situations, known as the
ASSIST scheme (EUROCONTROL, 2003f). This scheme covers all procedures for
aircraft emergencies but paradoxically does not cover any type of ATC equipment
failure. The ASSIST programme, captured in a publicly available document, is intended
to represent only a framework to be further customised and adapted to the specific
requirements of each ATC Centre utilising local expertise. Thus, each ATC Centre is
required to assemble a team of experts, implement the current ASSIST programme,
and discuss other safety-critical events (e.g. ATC equipment failures) to be included in
emergency procedures, training, and/or aide-memoire.
National air traffic service providers may publish their own procedures for
emergency/unusual situations in the MATS. The MATS contains procedures,
instructions, and information which form the basis of air traffic services within a country.
It is published for the guidance of civil air traffic controllers, but may also be of general
interest to other associated parties within civil aviation. For example, the UK MATS is
arranged in two parts. Part 1 is published by the UK CAA (as CAP 493; UK CAA, 2006)
and consists of instructions which apply to all UK ATC units. Part 2 is published by the
UK National Air Traffic Service Provider (NATS) and consists of instructions which
apply to a particular air traffic control unit (e.g. the London Area Control Centre).
NATS publishes specific recovery or fallback procedures in their internal MATS Part 2
document. This document defines 33 failure types and relevant strategies for their
recovery (NATS, 2002) and thus reflects the particular ATC system characteristics of
the UK ATC Centres. No information regarding the methodology to compile these
128
Chapter 5
recovery strategies is available. It can only be assumed that these recovery procedures
are a direct result of expert discussions, operational experience, and experience with
ATC system performance.
The manual advises that the planning controller should be the focal point in the sector
team during the duration of failure with the main objective to ensure that the
tactical/executive controller is supported at all times. The recovery procedure for each
of the 33 defined failures consists of the following:
a short description of the failure (i.e. what a controller should expect, what are the
potential effects on the ATC system);
a description of the system-generated alert (e.g. brown border, text message);
and
a list of required recovery steps (these steps are separately defined for planner,
tactical/executive, assistant controllers, and watch supervisor).
The New Zealand air navigation service provider (i.e. Airways New Zealand) publishes
MATS as required by the Civil Aviation Authority of New Zealand. This document
recommends the use of the recovery procedures for failures of significant components
(e.g. radar data processing, flight data processing, the overall communication system),
as these have the most severe effect of ATC operations. The recovery procedures are
published as a separate document designed to be readily available at each position
(Failure Modes Quick Reference Guide-FMQRG; Airways New Zealand, 2006a). The
main objective of this document is to provide ready and quick assistance to operational
staff for handling equipment failures (i.e. aide-memoire).
The German air traffic service provider (DFS) defines emergency checklists for various
aircraft-related as well as military-specific emergencies. This document created a basis
for the development of EUROCONTROLs guidance for controller training in the
handling of unusual/emergency situations and the ASSIST scheme (EUROCONTROL,
2003f).
However,
emergency checklists
developed
by DFS
(same
as
the
EUROCONTROL ASSIST scheme) do not cover any ATC equipment failures.

While ICAO provides generic recommendations for recovery, ANSPs tend to publish
recovery procedures in the form of a checklist of recovery steps that controllers need to
perform upon detection of any of the pre-defined unusual situations. This form is
practical and easy to follow especially in the case on unexpected and emergency
situations, such as equipment failures. Similar to other types of emergency situations, it
129
Chapter 5
is possible to define a set of equipment failure recovery steps whose implementation

lead to system protection and assurance of accurate situational awareness. The
selection of relevant recovery steps as well as the timely manner in which they are
implemented lead to effective or successful recovery. It is important to highlight that in
general all emergency/unusual situation procedures are intended as a general guide,
and controllers are expected to use their best judgment in any given situation.
As stated above, air navigation service providers that recognise the importance of the
existence of procedures for equipment failures publish them in their relevant manuals.
These unusual situations are slowly being included into a list of regular emergency
procedures. However, MATS manuals are not available in the public domain. For this
reason, it was necessary to set up a questionnaire survey to investigate the current
status and quality of procedures and training worldwide. The results of this survey are
presented in Chapter 6. The review of recovery procedures in ATC is concluded in the
following section by a discussion on identified areas of concern.
5.5.2 Main principles behind recovery procedures in ATC

Following the discussion of available recovery procedures in the aviation community,
this section summarises the key principles on the recovery procedures in ATC. These
are availability, design, and contents, as presented below.
The EUROCONTROL report on managing technical disturbances (EUROCONTROL,
2004e) concludes that procedures represent a critical factor for effective recovery. If no
procedures are available to the controllers, they may use their own mental models of
the ATC system and operational environment to decide on the most effective recovery
strategy. Such ad hoc performance can significantly vary depending on the quality of
the controller diagnosis of the failure occurrence, experience, available information,
and the failure complexity. Therefore, to assure minimal required safety performance, it
is essential to provide recovery procedures to controllers.
Recovery procedure design should focus on phases of the recovery process and steps
that the controller must perform to recover effectively and ensure a safe ATC service.
Furthermore, the procedure should also contain the key effects of the failure on the
operational system, so that there is no potential that the controller may implement the
wrong procedure. Appendix III presents a framework for a check-list type of controller
recovery procedure or aide-memoire that should be available at each Controller
Working Position (CWP). This aide memoire, designed in this research, is based upon
130
Chapter 5
the characteristics of the ATC Centre that participated in the experimental investigation
(presented in Chapters 9 and 10).
Finally, assuming that recovery procedures are available, their contents must be
accurate and kept up to date (i.e. reflecting all modifications/updates in the ATC system
architecture). They must be realistic, comprehensive, clear and easy to use, easily
accessible, and linked to regular emergency training.
After discussion on the recovery procedures and their key principles in ATC, the
following section discusses training for handling ATC equipment failures in a similar
manner.
5.6 Training for handling ATC equipment failures

In line with the recovery procedures, training is recognised also as a critical enabler for
effective recovery. This section reviews the existing regulations on training for recovery
from equipment failures in Air Traffic Control (ATC) at three levels: international,
regional/national, and air navigation service provider. This is followed by a discussion
on several areas of concern on training for unusual/emergency situations in ATC, as
identified in this research.

Regulation on training for handling ATC equipment failures, i.e. recovery training, exists
at three levels. These are: international (i.e. by the ICAO), regional or national (e.g. by
the EUROCONTROL at the regional level and CAAs at the national level), and ANSPs
level.
ICAO guidance on human factors can be found in the Human Factors Training Manual
(ICAO document 9683; ICAO, 1998). According to ICAO, human factors principles
account for design, certification, training, operations, and maintenance, as well as safe
interfaces between humans and systems. The module of Human Factors Training
Manual highlights the necessity to train controllers on skills such as controllerequipment relationship and operational aspects of automation (e.g. staying in the loop,
situational awareness, and the appropriate use of automated ATC equipment).
However, there is no specific guidance on training for emergency/unusual situations.
131
Chapter 5

A number of countries have realised the benefits of regular emergency training for
controllers and consequently have initiated training programs. In addition, on a
European scale, the EUROCONTROL European Manual of Personnel Licensing - Air
Traffic Controllers (EUROCONTROL, 2001d) now contains a requirement that ATC
units must include training for emergency/unusual situations in their training
procedures. It should consist of two segments: the first is to prepare trainees, prior to
validation, in the procedures used in the event of an emergency situation and the
second is for routine refresher training to enable qualified controllers to respond to
unusual or emergency situations in a competent and professional manner. The
importance of practicing unusual situations that have occurred elsewhere is recognised
and recommended as best practice. In general, the EUROCONTROL European
Manual of Personnel Licensing document details minimum standards for professional
qualification of controllers and has the aim of harmonising licensing schemes in
Europe. The following section describes how a particular incident made a significant
impact on the regulations related to emergency training within one Civil Aviation
Authority (CAA).
5.6.1.2.1 UK Civil Aviation Authority regulation
An emergency situation that occurred in the UK airspace highlighted both the
importance of the existence of training in unusual situations and the necessity for
refresher training. In short, a particular aircraft reported dangerously low oil pressures
in both engines and consequently declared an emergency situation. In this incident the
controller on duty handled the situation with a text book performance. The controller
informed the crew on the closest diverting airport, minimised radio frequency
transmissions still passing all relevant information, and arranged direct routeing and
descent towards the chosen airport. During the course of the subsequent investigation,
the controller, a young trainee, pointed out that his actions were timely and efficient as
a direct result of the training in handling emergencies received on the day before the
incident occurred (Baker and Weston, 2001).
As a result of the recommendations made in the report on this incident, in 1994 the UK
CAAs Safety Regulatory Group (SRG) decided to mandate such training for all UK
controllers (Baker and Weston, 2001). In 1999, an initial set of guidelines was
broadened to include team related aspects and to place additional focus on unusual
events rather than just emergencies. This change was reflected in the TRaining for
132
Chapter 5
Unusual Circumstances and Emergencies (TRUCE) scheme. TRUCE was designed to

ensure that staff involved in the provision of an air traffic control service are trained to
recognise and handle emergency occurrences and unusual circumstances in a
competent manner. Some of the emergency/unusual situations that severely affect the
ATC operations, including equipment failures, are mandatory in the TRUCE scheme
(UK CAA, 2003).
5.6.1.3 Air navigation service provider regulation
As noted above, aviation authorities recognise the importance of regular training for
equipment failures. According to the regulations issued by Civil Aviation Authorities
(CAAs), training for emergency/unusual situations is usually set up by air navigation
service providers within their respective ATC Centres. However, the type and
frequency of emergency training can deviate from existing regulations (due to shortage
of staff and infrastructure and the high costs involved). To further augment the
regulations on recovery training available from CAAs, it was necessary to set up a
questionnaire survey to investigate the current provision of recovery training worldwide.
The results of this survey are presented in Chapter 6.
5.6.2 Areas of concern related to recovery training

Currently in ATC there are several issues of concern related to training. Firstly, the
recovery training should follow the phases of the recovery process, where adequate
time and guidelines should be given for failure diagnosis. Secondly, established
controllers have been trained in non-radar or procedural control, which is not the case
with newly qualified controllers. This means that established controllers posses the
skills to handle any degree of radar failure (as one of the most severe equipment failure
types).
Thirdly, the frequency, comprehensiveness and range of unusual situations for training
in the simulated ATC environment vary from Centre to Centre. While some ATC
Centres offer comprehensive initial training, supported by annual refresher training,
other Centres offer little or no opportunity for staff to practise coping with unusual
occurrences in a simulated environment (EUROCONTROL, 2004e). This lack of
regular training and the infrequent occurrence of serious equipment failures may lead
to a serious lack of experience with recovery performance. In addition, as the newly
more automated ATC systems tend to be reliable, controllers are deprived of the
opportunity to experience equipment failure and recovery in the operational
environment, and therefore need to gain these experiences through regular training.
133
Chapter 5
Fourthly, in spite of the clear need for regular training, the lack of resources
(infrastructure and staff) makes it impossible to train controllers for all different types of
emergency/unusual situations and all equipment failure types. For this reason an
organised exchange of experience at the level of ATC Centres, countries, or regions
(e.g. ECAC states, EUROCONTROL member states) may provide valuable knowledge
and insight into various unusual situations and strategies to resolve them. As an
example, in 2003 an A300 was struck on the left wing by an air missile system resulting
in a complete loss of hydraulics and therefore loss of all flight controls. Reacting
rapidly, the captain recalled a television documentary he had seen about a DC-10
crash at Sioux City, Iowa, and the thrust change technique employed by the captain
and crew of the DC-10 to control their aircraft. Although the A300 crew had never
practiced this technique before, they quickly gained control despite the extreme stress
of the situation (IFALPA, 2005). This example shows the importance of exchanging
information on knowledge, performance, and strategy between human operators.
Similar experience could be achieved in the area of ATM by supporting workshops,
newsletters, and other forms of information exchange on best practices and handling of
unusual events.
Finally, the EUROCONTROL (2004e) report on managing technical failures in ATC
points out potential future problems identified through controller interviews. Firstly, it
suggests that the mental picture of the traffic situation will be more difficult to form in
the future ATC environment. Secondly, it suggests that in the future, the controller may
require more knowledge of the ATC system architecture when compared to today.
Finally, the report suggests that newly qualified controllers and fully established
controllers have different perceptions of one another: newly qualified controllers are
perceived by some fully established controllers to be more trusting of the reliability of
new equipment, having rarely experienced failures in the past, while established
controllers are perceived by some newly qualified controllers as less computer literate
and more suspicious of technology.
The previous sections of this Chapter revealed the complexity of controller recovery by
discussing its relevant phases, from failure detection to the outcome of the recovery
process. In addition, the past research identified factors that influence the quality of
controller recovery. The next section defines a set of variables that capture the
important characteristics of controller recovery. These are the context that surrounds
the controller recovery process, the recovery effectiveness, as well as the recovery
134
Chapter 5
duration. These variables guide the design of the experiment to capture real data on
controller recovery later on in the thesis.
5.7 Definition of controller recovery in this thesis

This thesis investigates the process of controller recovery from equipment failures in
Air Traffic Control (ATC). From discussions in the preceding sections of this Chapter, it
is clear that controller recovery (as a human recovery in a particular context of the ATC
system) is a complex process that involves a number of steps that can be assessed
using different methods and variables. In summary, a credible assessment of controller
recovery should answer the following questions:
What are the factors that influence controller recovery performance and choice of
recovery strategy (i.e. characteristics of the recovery context)?
What is the effectiveness of the selected and implemented recovery strategy (i.e.
the required recovery steps and the outcome or effectiveness of recovery)?
How efficiently does a controller respond to an equipment failure (i.e. the recovery
duration)?
These questions are discussed in the following sections.

Human reliability assessment research over the years has shown the important role of
the context in which human performance take place. Recent techniques now place
more emphasis on the definition of key contextual factors and their impact on the
reliability of human performance. Context affects every part of the process of
recovering from equipment failures and thus includes past experience and the status of
recovery procedure and training relevant to a particular equipment failure under
investigation. As stated by EUROCONTROL (2004e), context is everything. Chapter 7
of this thesis presents a detailed review of the current understanding of contextual
factors in various ATM and non-ATM industries. The research presented in this thesis
uses these findings together with results from controller interviews to identify the
contextual factors relevant to controller recovery from equipment failures in ATC.
Furthermore, these factors are used in conjunction with an appropriate methodology to
further analyse controller performance during the process of recovery from failures and
to quantitatively define the recovery context indicator (Chapter 8). In addition, the
importance of the recovery context is further explored in the experiment (Chapters 9
and 10).
135
Chapter 5

The recovery effectiveness of each controller responding to an unusual, emergency, or
non-nominal situation can be characterised by a set of required recovery steps.
Sections 5.5 and 5.6 reviewed the existing schemes for handling emergency
occurrences, achieved through defined recovery procedures and training. Existing
procedures and schemes were reviewed including the UK CAAs TRUCE scheme, UK
NATS fallback procedures, Airways New Zealand, and the German air service provider
(DFS) emergency checklists, all designed to ensure that staff involved in the provision
of ATC service are trained to recognise and resolve any emergency situation in a
competent manner.
In
addition,
the
review
included
the
overview
of
the
EUROCONTROLs and ICAOs guidance for recovery procedures and recovery

training (EUROCONTROL, 2003f; ICAO, 2001a).
In general, these safety schemes create a checklist of recovery steps that follow the
phases of the recovery process (i.e. detection, diagnosis, correction). These checklists
are written procedures and controllers are expected to know and follow them. In a
similar way, an ATC equipment failure is considered as one type of unusual/emergency
situation. Although equipment failure related procedures or checklists are not always
available, it is possible to define a set of required recovery steps, whose
implementation can assist in the protection of the system and preservation of accurate
situational awareness. The selection of relevant recovery steps and the time frame in
which they are implemented contribute to an effective or successful outcome of the
recovery process. This is explained further in the following section.

The duration of the controllers recovery process is time measured from the first overt
controller action to the end of the recovery process. The end of the recovery process is
influenced by the restoration of the failed component or by the reversion to the backup
facilities (i.e. fallback systems). The analysis of operational failure reports (Chapter 4)
indicates that the longer the failure, the less severe it tends to be. As a result, the
research presented in this thesis focuses on failures of short duration. Furthermore,
past research has focused on the reaction time, while putting more emphasis on its
extreme values (see Wickens, 2001). However, extracting the controller reaction time
can be an extremely difficult task as this first reaction usually represents covert (i.e. not
directly observable) behaviour. For this reason, the research presented in this thesis
136
Chapter 5
focuses on the controllers first action that is observed on the ATC system (e.g.
communication regarding identified failure, interaction with HMI).
Apart from the moment of actual detection, the recovery duration variable may also
lack some aspects of the diagnosis phase. In other words, the cognitive processes
behind understanding the new situation and prioritisation of the recovery tasks to be
performed may also occur covertly. For example, the real cause of the communication
failure is not immediately obvious as the controller needs to investigate if the failure
affects ground ATC equipment or airborne radio equipment. Both of these features of
controller recovery are considered in the design of the experimental investigation
presented in Chapter 9.
5.8 Summary
As pointed out at the beginning of this Chapter, a good understanding of recovery
requires a detailed assessment of the recovery process from both the technical and
human perspectives. Whilst the previous Chapter discussed the technical recovery, this
Chapter focuses on controller recovery. The Chapters starts by distinguishing the
objectives of two separate groups of operators involved in recovery from equipment
failures, namely controllers and engineers. While this thesis focuses solely on controller
recovery from equipment failures, the reviewed theoretical background to human
recovery is applied to the controller recovery by identifying its major phases. As a
result, the main phases of controller recovery together with the outcome of the overall
recovery process have been described. Finally, various models of human recovery,
developed for both ATM and non-ATM industries, have been discussed with emphasis
on three of the most relevant ones to controller recovery. These are: the model by
Kanse derived for recovery performance in the chemical process industry, the RAFT
tool derived specifically for the ATC operational environment, and the model by
Wickens generally focusing on the impact of different levels of automation on the
recovery process.
Apart from identifying the main phases of the controller recovery process, the review of
the theoretical background has also highlighted the factors that influence the quality of
controller recovery, namely past experience, recovery procedures and training. While
past experience is aggregated throughout the controllers operational experience, the
current status and quality of recovery procedures and training are regulated by
international and national aviation authorities. Thus, the Chapter reviews and discusses
the current status of regulation regarding recovery procedures and training, whilst the
137
Chapter 5
feedback regarding controllers past experience is gained through from the

questionnaire survey presented in the following Chapter. After reviewing theoretical
findings extracted from ATM and non-ATM research relevant to controller recovery, the
Chapter concludes by proposing a set of variables for an in depth assessment of
controller recovery. This is achieved by assessing the context, quality, and temporal
characteristics of the controller recovery process. These variables also guide the
experimental design to collect real data on controller recovery (Chapter 9).
138
Chapter 6
Questionnaire Survey
Chapter 5 showed that limited research has been carried out globally on human
reliability in relation to controller recovery. Hence, this Chapter presents the details of a
questionnaire survey scheme with the aim of overcoming the lack of knowledge and
further support the research in this thesis. The specific objectives of the questionnaire
survey are to investigate controller experience with equipment failures and to identify
factors that affect their recovery, to extract more operational experience, to investigate
the status and quality of recovery procedures and training, and to contribute to the
wider human reliability research by assessing the specific controller recovery. The
Chapter starts with the definition of the target population and sampling. It proceeds by
discussing the survey methodology identified for the collection of questionnaire
responses, design of the questionnaire, and the refinements identified by a pilot survey.
This is followed by the description of the full survey scheme (Figure 6-1). The Chapter
concludes with the methodology for the questionnaire survey data analyses structured
in three segments. These are: assessment of the sample characteristics, high-level
frequency analyses, and in depth assessment of interactions between recovery factors.
139
Chapter 6
Figure 6-1 The flow diagram of organising a survey
6.1 Objectives of the questionnaire survey

One of the objectives of the research presented in this thesis is to address the general
lack of knowledge in the area of controller recovery from equipment failure. This is vital
in oer to enhance safety and operational efficiency in the current and future ATC
environment. As described in Chapter 5, although significant human reliability research
has been undertaken in other industries, such as nuclear and chemical processing, it is
not directly transferable to the highly dynamic ATC environment. In order to address
the issues above, the questionnaire survey presented in this Chapter focuses on four
objectives. Firstly, the survey is designed to investigate controller experience with
equipment failures and to identify factors that affect controller recovery. This is to be
achieved by extracting the operational experience from the sample of air traffic
controllers. Secondly, the survey is to be used to augment the information obtained
from the operational failure reports (as presented in Chapter 4) which lack any input on
controller recovery. This is achieved by questioning the participating controllers as to
140
Chapter 6
the most severe failures they have experienced. Thirdly, the survey contributes to the
determination of the status and quality of recovery procedures and training in ATC
Centres (and thus augments the findings from Chapter 5). Finally, the survey is
designed to contribute to the wider human reliability research by assessing the specific
controller recovery performance.
Six key questions were formulated in order to achieve the four objectives. The
questions (below) address ATC equipment, controller recovery performance, and
status of recovery procedures and training:
How often do controllers experience equipment failures (Q1)?
What factors influence their recovery performance (Q2)?
What is the most unreliable ATC equipment (Q3)?
Is there any organised exchange of information on equipment failures and/or other
types of unusual/emergency situations (Q4)?
Do recovery procedures exist (Q5)?
What do controllers feel about the quality of training currently available for recovery
from equipment failures (Q6)?
Given the objectives of the questionnaire survey above, the next section defines the
target population and sample size.
6.2 Sampling
The population for this questionnaire survey should consist of controllers from various
ATC Centres worldwide. The population characteristics to be sampled in this survey
are ATC Centres with different levels of traffic and airspace complexity, and ATC
system automation, and controllers with a range of operational experience (i.e. years in
service, rating).
Using the United Nations (UN) statistics that there are 191 independent countries
worldwide (United Nations, 2006), it is possible to estimate the total number of ATC
Centres. However, data on the number of ATC Centres for each country were not
available to this research1. Therefore another approach based on the distribution of
global air traffic (Airbus, 2004) has been used. In other words, the ideal sample should
consist of regional distributions of sampled controllers that correspond to the air traffic
1
Personal correspondence with International Federation of Air Traffic Controllers' Associations

(IFATCA) revealed that this data is not available.
141
Chapter 6
distribution as presented in Figure 6-2. Moreover, it is also important to obtain a sample

which represents the current distribution of air traffic but also account for its future
predicted growth. The predicted growth in air traffic to the year 2023 indicates the
importance of Asia/Pacific and Middle East regions, while other markets remain steady
(Figure 6-2). Airbus (2004) predicts that Asian airlines will experience the fastest
growth rates. This prediction is in line with observed changes in the aviation market
and the shift towards Asian operations (Airbus, 2004; Air Transport Action Group,
2005). Moreover, it is predicted that by 2023 the already mature North American
domestic market will lose its historical dominance to both Europe and the dynamic
Asia/Pacific region. Based on all these findings, the target of the questionnaire survey
should be to collect responses from Asia/Pacific, Europe, and North America
corresponding to characteristics of the population surveyed (i.e. different levels of traffic
and airspace complexity, ATC system automation, and controllers experience).
35
30
Percentage
25
20
2003
32 32
31
15
33
2023
26
25
10
5
3
4
0
Africa
Latin America
and Caribbean
Asia and
Pacific
Europe
North America
Middle East
Region
Figure 6-2 Distribution of world air traffic per region for the years 2003 and 2023 (adapted from
Airbus, 2004)
Having defined a target population and its characteristics to be sampled, it is important

to define the size of the sample. Collecting a large sample of data would pose a
significant challenge as it would be a logistically huge task and very time consuming for
one single researcher. Therefore, the sample size needed to be contained within
manageable proportions. However, the sample still needed to be representative of the
population of controllers. As guidance, the modelling of controller operational
experience with the normal distribution requires approximately 20 data points (Shier,
2004). Increasing this minimal sample size by a factor 5, the target sample size was
initially aimed at 100 controller responses. This sample size is in line with the sample
used to support a Federal Aviation Administration (FAA) study of similar scope (i.e. 128
responses from aviation experts; Funk, Lyall, and Riley, 1996). However, target sample
142
Chapter 6
size (in terms of number of controllers and ATC Centres sampled) would vary
according to the choice of data collection method and available resources.
6.3 Survey methodology

Surveys have long been recognised as a valid method for measuring attitudes (or
preferences), beliefs, or facts (including past behavioural experiences). Actually, one of
the most common uses of surveys is to measure individuals past behavioural
experiences (Weisberg, Krosnick, and Bowen, 1996). The aim of the questionnaire
survey presented in this thesis is to collect facts regarding equipment failures and
controller recovery, in particular the operational experience and status of procedures
and training for equipment failures. Therefore, using a survey to collect these types of
data is justified.
Due to the nature of this survey, the methods available were either to gather the
information directly from face-to-face interviews with controllers in various ATC Centres
or remotely by self-completion via the internet and professional networks. Although less
reliable, the use of the internet and professional networks is useful in presenting a
wider picture of controller experience and recovery from equipment failures. The
advantages and disadvantages of both methods are presented below.
Data gathering through face-to-face interviews requires visits to ATC Centres and
direct access to controllers. This approach is comparatively more reliable since it
presents the opportunity to clarify any issues either prior to or during the interview.
Moreover, it facilitates representative sampling for example within an ATC Centre as
more than one controller can be asked to participate. The drawbacks of this approach
are the practical and financial issues related to the cost of travel and access to enough
ATC Centres to generate a representative sample depending on the characteristics of
the population.
In a self-completion survey, the questionnaires are distributed using a professional
network or popular aviation related internet forums. Compared to face-to-face
interviews, this method saves time and enables more questionnaires to be distributed.
However, research has shown that the response rate is inferior to face-to-face
interviews. A response rate of 10 to 50 percent is usually achieved with self-completion
questionnaires compared to 100 percent in the case of face-to-face interviews. This
means that in order to collect 100 samples, between 200 and 1000 questionnaires
should be distributed. The questionnaires may be distributed via personal/professional
143
Chapter 6
network and corresponding emails. However, accessing the email addresses of 2001000 controllers worldwide presents a significant obstacle to the distribution of
questionnaires.
Additional problems with the self-completion method are the number of responses and
the quality of survey sample obtained. The self-completion method depends entirely on
the intention and willingness of the controller to participate in the survey. Thus it is
harder to control the number of responses obtained. Apart from the high likelihood of
low response rate of a self-completion survey, another drawback is that the quality of
the answers cannot be controlled. Even in the case of straightforward questions,
respondents may misinterpret some of the questions or may need more information on
the subject under investigation. The presence of the researcher, while the respondent
is answering the questions, provides the advantage of ensuring that the respondent
understands what is required from the survey.
After careful consideration of both the advantages and disadvantages of the two survey
methods (face-to-face and self-completion), both were adopted in this thesis. This
decision was based on the need to exploit the strong points of both methods
particularly given the timing and response rate constraints. In order to maximise the
benefit of the combined approach, the design of the questionnaire must account for
their unique characteristics.
6.4 Design of the questionnaire

It is very important when designing a questionnaire to focus on information needed for
the study and to present questions in an unbiased fashion to enable responses with a
high degree of fidelity. The length of the questionnaire should also be considered.
Given the decision to use both face-to-face interviews and self-completion surveys, it
was necessary to focus on a questionnaire design that meets the requirements for both
methods. While face-to-face interviews allow a more complicated structure for the
questionnaire, additional attention has to be paid to the length of the interview. Selfcompletion survey allows detailed questions to be designed using a less complex
structure. This survey method requires a written introduction to explain the objectives of
the study, its added value, and the key features of the survey itself (e.g. format, type of
questions, approximate time required for the survey completion).
One of the possible solutions was to design two sets of questionnaires; one for the
face-to-face interview and the other for the self-completion survey. However, to assure
144
Chapter 6
the highest reliability and completeness of responses, it was decided to use one
questionnaire design in both survey methods. The aim was to design the questionnaire
survey to extract the maximum information whilst ensuring convenience for both faceto-face and self-completion respondents. This was achieved following several design
principles. Firstly, special attention was given to clarity of questions to avoid any
ambiguity in the self-completion survey. Secondly, emphasis was placed on closed
questions, where the respondents answers did not require the presence of the
researcher. Closed-ended questions can be answered finitely by one of the given
answers; the simplest form being the yes/no answer. In general, these questions are
restrictive and can be answered in a few words. Thirdly, all key terms were defined.
Finally, for open questions, a list of potential answers were provided to guide the
respondents (e.g. for the question on the most unreliable ATC equipment, a
comprehensive list of various ATC equipment were provided). Open-ended questions
allow respondents to answer in their own words providing a narrative. In general these
questions solicit additional information, as they require more than one or two word
responses. Furthermore, the questionnaire was designed in a way that ensured that
any inconsistencies in responses can be identified. This was achieved through the
careful choice of questions and by having multiple questions assessing a particular
issue (e.g. recovery procedures).
The questionnaire has been structured around the main objective of the research
presented in this thesis. In other words, all the questions have been designed to
support the research on controller recovery from equipment failures in Air Traffic
Control (ATC). Based on the type of information obtained, the questionnaire is
structured in four distinct groups totalling 29 questions. The first group consists of
general and specific questions. The former covers the overall operational experience,
ratings, and the country/ATC Centre where the respondent works. The latter inquire
specifically about experience with equipment failure, asking the respondent to list
several examples in a greater detail. This first group consists of five questions.
The second group of questions inquires about the factors that affect controller recovery
by asking the respondent to rate the importance of three factors. This is followed by the
question on the most unreliable ATC systems/components, as well as the
organisational issues relevant for recovery. In total, this second group consists of four
questions.
145
Chapter 6
The third group of questions focuses on the existence and quality of recovery
procedures at the ATC Centre where the respondent works. This group consists of 11
questions.
The fourth group of questions focuses on the existence and quality of training for
recovery at the ATC Centre where the respondent works. This group has nine
questions. The final question provides an opportunity to the respondent to add
comments and suggestions related to the entire questionnaire.
The following is a one-page example of the questionnaire which was used during the
survey (Figure 6-3). It is the second page of the questionnaire. A complete
questionnaire is included in Appendix IV, while an example of a response to the
questionnaire is provided in Appendix V.
Figure 6-3 One-page example of the questionnaire
6.5 Pilot survey

Before conducting the full survey, a small-scale pilot survey was performed to verify the
clarity of questions and the time necessary to complete the questionnaire. It surveyed
two
EUROCONTROL
in-house
controllers,
two
ATM
specialists,
and three
psychologists with backgrounds in ATC and the design of questionnaire surveys. No

conflicting issues have been identified between them. Their input included only minor
amendments in the design of the questionnaire, such as additional emphasis on the
146
Chapter 6
added value of the survey and how the results would be used. This information was
included in the introductory page of the questionnaire (i.e. the first page). Additionally,
the pilot survey revealed the need for some examples of ATC equipment/tools which
were added as a note after question 5. These changes were incorporated in the final
design of the questionnaire.
The following sections discuss how the survey methodology has been exploited to
achieve the target sample size.
6.6 Full survey

As discussed previously responses have been gathered using face-to-face interviews
and self-completion methods. The results are briefly presented below.
6.6.1 Face-to-face interviews

Professional visits to various ATC Centres and relevant organisations were used to
distribute questionnaires to available controllers and capture their responses through
face-to-face interviews. Using this approach, responses were received firstly from the
visited ATC Centres (involving controllers from India, Serbia, and Ireland), their training
facilities and various controllers in training (involving Irish and Maltese controllers).
Secondly,
responses
were
received
from
the
controllers
involved
in
the
EUROCONTROLs Gate to Gate project on real-time simulations (involving controllers

from the Netherlands, Germany, Italy, France, Sweden, Spain, and Slovenia). Finally,
responses
were
received
from
controllers
on
various
courses
run
by
EUROCONTROLs Institute for Air Navigation Services (IANS) (involving controllers

from Belgium, Ireland, Switzerland, Netherlands, Romania, Sweden). In spite of the
high costs involved, approximately 40 percent of the data were collected using face-toface interviews, where controllers had an opportunity to clarify any doubt before
answering questions.
6.6.2 Self-completion survey

Self-completion survey involved electronic distribution of questionnaires by Imperial
College colleagues visiting various ATC Centres and via professional networks and
popular aviation related internet forums. Countries visited included Tahiti, South Africa,
Tanzania, a number of European countries, Macau, New Zealand, Singapore, and
1
IANS provides regular courses to ATC staff from all EUROCONTROL Member States (i.e. 37
European countries).
147
Chapter 6
China. In addition, the Imperial College colleagues exploited professional networks of

controllers to gain more responses. These networks have links to EUROCONTROL
and ATM specialists in various air navigational service providers, and hence resulted in
responses from Croatia, Finland, Switzerland, Macedonia, Moldova, India, and
Germany. Additionally, the Professional Pilots Rumour Network (PPRuNe) forum, an
aviation website dedicated to airline pilots and others in aviation business including air
traffic control staff, was also used for obtaining survey data (see PPRuNe, 2006). The
aims and objectives of the survey and the overall research were posted on this
particular internet forum on two separate occasions to attract controllers worldwide. If
interested in participating in this survey, controllers were advised to contact the
researcher and thus obtain an electronic copy of the questionnaire survey. In spite of
an initially high level of interest, only a few responses were collected using this method
(including Australia and United Kingdom). Overall, approximately 60 percent of data
was collected using the self-completion method.
6.6.3 Potential sources of errors

There are two main potential sources of error in the survey. These sources are the
respondent and data pre-processing. In general respondent errors may occur for a
variety of reasons. It was noted for example, that controllers from the same ATC
Centre gave contradicting answers to particular questions. Possible causes for this
include imprecision in the formulation of questions, lack of knowledge on the part of
controllers (on existence of recovery procedures, training, organised exchange of
information), and misinterpretation of questions. The imprecision in the formulation of
questions was addressed by the pilot study and thus should not have played a
significant role in generating respondent errors.
Lack of knowledge on the part of controllers was noted for the questions on the status
of recovery procedures, training, and organised exchange of information within their
ATC Centre. For example, while a group of controllers from an ATC Centre was aware
of the recovery procedures, others stated that these procedures do not exist. These
inconsistent responses were further investigated using the related questions. For
example, if the controller responded that no recovery procedures are defined within
his/her ATC Centre (the first question related to recovery procedures), the subsequent
questions
related
to
recovery
procedures
are
investigated
(e.g.
adequacy,
completeness, currency). The final judgement is based on all answers that were
provided in relation to recovery procedures (and not only the first one).
148
Chapter 6
Misinterpretation was also noted in the question on the number of equipment failures
experienced annually. In this particular case, the data collection reflected the overall
misinterpretation of the term equipment failure and the consequent variation in the
answers. While some controllers reported all equipment failures they experienced
within one year regardless of severity, others reported only major failures classified as
infrequent high severity occurrences.
The possibility of errors arising from pre-processing of the responses was mitigated by
extra care at the data input stage (i.e. double checking of each input). In the case of
multiple response questions or questions returning a range instead of a single value, a
consistent approach was taken. For example, in response to question 4 What is the
average number of ATC equipment failures during one year that you experience? the
respondents tended to provide either a single numerical value, range, or a textual
answer. In the case of range, the middle value was taken. This method has been
applied consistently with other questions, if necessary. Textual answers have been
transformed into numerical values (e.g. once in two years was considered as 0.5 per
year). However, sometimes these textual answers could not be transformed to
numerical values and thus the answer was omitted (e.g. question 5 segment on
frequency and duration of failure was answered minutes, very frequent, very often,
rarely, very rarely, or once in career).
The next section describes the methodology behind the analysis of questionnaire
survey results.
6.7 Methodology for the questionnaire survey data analysis

This section starts with a discussion on the questionnaire data pre-processing issues. It
then proceeds with the analysis of questionnaire survey data organised in three
segments. The first segment deals with the characteristics of survey sample in terms of
number of countries, ATC Centres, and controllers surveyed. This segment also
focuses on the characteristics of controllers by assessing their operational experience
(i.e. number of years in service) and rating3. The second segment of the questionnaire
survey data analysis presents the high-level summaries of responses, i.e. simple
percentage analysis (Figure 6-4). These summaries are organised in seven subgroups, corresponding to the six key questions that the questionnaire survey was
Differentiating between Area Control, Approach Control, and Tower rating.
149
Chapter 6
designed to answer (see section 6.1) whilst the seventh sub-group presents other
findings captured in the survey (presented in Appendix VI). The final segment of the
questionnaire survey data analysis provides an in-depth investigation of the interaction
between recovery factors previously analysed. The following sections discuss the
results and findings generated using the process in Figure 6-4.
Questionnaire
survey data
Characteristics of
the sample
High-level
analyses
58 ATC Centres
134 controllers
Experience with
equipment
failures
Interaction
analyses
Factors that
influence recovery
performance
The most
unreliable ATC
systems
Organised exchange
of information on
equipment failures
Status of recovery
procedures
Status of training
for recovery
Other findings
reported in
Appendix VI
Figure 6-4 The flow chart of questionnaire survey analyses
6.7.1 Data pre-processing

The data collected during the survey was subjected to further statistical analysis using
the SPSS statistics package. Each respondent was given a numerical identifier (serial
number) but no identifying information, such as the persons name, was used. The
150
Chapter 6
choices made in the questionnaire by each respondent were recorded under each
corresponding serial number.
During the process of data pre-processing and analysis, all available responses were
taken into account. A special scoring technique was used for questions that required
the ranking of choices (question 6). In this particular case, the controllers were asked to
score their reliance upon written procedures, situation-specific problem solving, and
other factors during the recovery process. This approach is explained in detail in
section 6.7.3.2.
6.7.2 Characteristics of the sample

A total of 134 questionnaire responses were received from 58 ATC Centres spread
across 34 countries (Table 6-1). According to UN data, this questionnaire survey
covers 17.8 percent of independent countries worldwide.
Table 6-1 Summary of the questionnaire survey sample

Country
Ireland
Finland
Serbia
Switzerland
United Kingdom
Netherlands
Germany
Spain
Norway
Italy
France
Sweden
Number of responses
per ATC Centre
7
4
5
1
3
8
8
1
2
1
1
1
1
1
5
5
1
1
1
2
1
2
1
1
1
1
3
ATC Centre
Shannon
Dublin
Cork
Kemi
Belgrade
Zurich
Geneva
Bristol
Maastricht
Nieuw Milligen
Amsterdam
Karlsruhe
Langen
Frankfurt
Seville
Olso
Kirkenes
Stavanger
Bodo
Rome
Bologna
Naples
Venice
Milan
Paris
Nice
Stockholm
151
Number of responses
per country
16
1
3
16
1
4
3
5
8
2
8
Chapter 6
Slovenia
Belgium
Macedonia
Croatia
Moldova
Iceland
Denmark
Portugal
South Africa
Tanzania
India
Singapore
Tahiti
Australia
Austria
Romania
Malta
Macau SAR
Kenya
New Zealand
China
Malaysia
Total
34
Malmo
Gothenburg
Ljubljana
Brussels
Skopje
Split
Zagreb
Pula
Zadar
Chisinau
Reykjavik
Copenhagen
Lisbon
FAJS
Dar el Salaam
Mumbai
Kolkata
Singapore
Papeete
Melbourne
Vienna
Bucharest
Malta
Loqa airport
Macau
Nairobi
Wellington
Auckland
Christchurch
Hong Kong
Subang
58
3
2
1
3
1
1
1
1
1
1
2
3
4
2
1
3
4
2
6
1
2
2
2
1
3
4
1
2
2
1
2
134
1
3
1
4
1
2
3
4
2
1
7
2
6
1
2
2
3
3
4
5
1
2
134
Section 6.2 defined the sampling methodology to correspond to the distribution of

global air traffic per region for the year 2003, taking into account the predicted growth
and estimates to the year 2023 (Airbus, 2004; Air Transport Action Group, 2005).
Assuming a similar distribution of traffic for the period of the survey (2005 and 2006)
and predicted changes in the distribution of future air traffic, the questionnaire sample
lacks the input from two key markets, namely the North America and Middle East
(Figure 6-5).
152
Chapter 6
80
70
Percentage
60
50
40
75
30
20
10
20
0
Africa
Latin America Asia and Pacific

and Caribbean
Europe
North America
Middle East
Region
Figure 6-5 Distribution of questionnaire responses per region
However, looking back at the characteristics of the population surveyed, the sample
still manages to capture the diverse levels of traffic and airspace complexity, ATC
system automation, and controllers with a range of operational experience (i.e. years in
service, rating). For example, in the European region the responses from Paris,
Frankfurt, Amsterdam, Zurich, Geneva, and Maastricht represent the input from some
of the busiest European ATC Centres. Likewise from Asia, the responses from
Mumbai, Hong Kong, and Singapore represent some of the busiest ATC Centres on
the continent as well as those that have experienced considerable growth in recent
years. Finally, the sample also includes ATC Centres with technically advanced
systems, e.g. Malmo ACC in Sweden, Maastricht ACC in Netherlands, Shannon ATC
in Ireland, and the Oceanic Control Centre in Auckland, New Zealand.
Although only five percent of responses were received from the African continent, the
ATC Centres sampled were considered carefully. Johannesburg and Nairobi airports
represent the leading airports in Africa for both passengers and cargo (Air Transport
Action Group, 2005). Both regions are experiencing an increase in passenger
movement mostly as a result of growth in tourism. Failure of ATC equipment and the
recovery response of controllers are of considerable importance in such busy ATC
Centres, more so than in other ATC Centres in Africa with considerably less traffic.
Given the difficulties encountered in accessing ATC Centres and controllers worldwide
(e.g. security, logistics, related costs) and the characteristics of the population
surveyed, the obtained sample can be considered as representative of the population.
The next section assesses the adequacy of sampling achieved within each ATC
Centre.
153
Chapter 6
6.7.2.1 Sampling per ATC Centre

Although 27 ATC Centres only had one response per Centre, analysis of these ATC
Centres shows that their characteristics do not differ from the characteristics of the
remaining sample. For example, these ATC Centres include some of the busiest ATC
Centres (e.g. Frankfurt, Paris, Hong Kong) as well as those with low traffic and
airspace complexity (Kemi-Finland, Bristol-UK, Bologna-Italy, Ljubljana-Slovenia,
Zagreb-Croatia). They also include ATC Centres with technically advanced ATC
system (e.g. Frankfurt, Amsterdam, Karlsruhe, Stavanger, and Melbourne). Finally, the
characteristics of controllers include all levels of operational experience (i.e. ranging
from 3 to 39 years in service) and ratings. In short, these 27 ATC Centres capture the
characteristics of the target population and as such will be included in the further data
analyses.
6.7.2.2 Sampling of air traffic controllers
The questionnaire survey captured interesting information related to the operational
experience of controllers, namely years of experience, country of residence, and ATC
facility location (i.e. city or airport). The survey data show that on average controllers
have more than 13 years of operational experience (i.e. length of service), ranging from
1 to 39 years. More than 77 percent of the controllers surveyed have up to 20 years of
experience. Taking into account the length of service captured in this survey, it is split
into four categories: 1-10, 11-20, 21-30, and 31-40 years (Figure 6-6). The sample is
reasonably representative of the population as all categories are represented. There
seems to be fewer respondents with over 30 years of experience in the sample
collected. However, this is expected as the majority of controllers with more than 30
years in service tend to move to operational support roles, including training,
instructing, and management.
154
Chapter 6
Figure 6-6 Distribution of operational experience
Furthermore, Figure 6-7 presents the distribution of the ratings of the controllers who
participated in the survey. In general, most controllers have ACC ratings. As a result,
data analyses may be biased towards the experience within the ACC environment
which tends to be better staffed and with more access to advanced equipment/tools
(e.g. multiple radar sites feed the radar coverage instead of single radar site as in APP
and TWR control, and investment in the more automated systems).
35
30
31.34
26.12
Percentage
25
20
15.67
15
10.45
9.7
10
5
2.24
3.73
APP
TWR
0
ACC & APP & ACC & APP
TWR
ACC & TWR
APP & TWR
ACC
Rating
Figure 6-7 Distribution of controllers ratings
6.7.3 High-level analyses

This section presents high-level results from the simple percentage analyses of the
entire dataset. These summaries are organised into seven sub-groups, corresponding
to the six key questions that the survey was designed to answer (defined in section 6.1)
155
Chapter 6
and concluding with other findings on controller recovery (captured in question 5).
Therefore, the relevant sub-groups are: experience with equipment failures in the ATC
Centre, factors that influence the recovery performance, the most unreliable ATC
systems/tools, organised exchange of information on equipment failures, status and
quality of recovery procedures, status and quality of training for recovery, and other
findings. Each of the sub-groups is discussed below.
6.7.3.1 Experience with equipment failures (Q1)
In the sample obtained, 94.8 percent of controllers did experience some kind of ATC
equipment failure in their career. Additionally, this group of controllers experienced on
average 17 equipment failures annually, ranging from less than 1 per year up to 600,
as reported by one ATC Centre. This dispersion of the results reflects the wide
variation in the interpretation of equipment failures. Some controllers interpreted the
question on equipment failures in terms of only major (more severe) failures. Their
answers ranged from less than one (e.g. once in two years, once in five years, once in
a career) to one failure annually (34.6 percent of responses). Other controllers reported
the total number of failures experienced annually regardless of their level of severity, as
their responses ranged from dozens to hundreds. In short, the vast majority of
controllers surveyed have experienced equipment failures.
6.7.3.2 Factors that influence controller recovery performance (Q2)
Controllers were asked to rate how much they relied upon written procedures,
situation-specific strategies (i.e. context), and other factors (e.g. past experience) in
handling equipment failures. The ratings ranged from one to five, where one stands for
very much, two for much, three for moderate, four for minimal and five for not at
all.
The results show that more than 45 percent of the controllers surveyed rely on written
procedures in the event of an equipment failure at the levels of either much or very
much (see Figure 6-8). These controllers have on average more than 13 years of
experience, they operate in ATC Centres with recovery procedures (96.4 percent of
controllers who rated written procedures much or very much) and recovery training
schemes (64.3 percent controllers who rated written procedures much or very much).
156
Chapter 6
50
Frequency
40
30
37.4%
20
23.58%
22.76%
10
13.01%
3.25%
0
Very much
Much
Moderately
Minimal
Not at all
Written procedures
Figure 6-8 Controllers reliance on written procedures throughout the recovery process
When it comes to situation-specific problem solving, 63.48 percent of controllers rated

this factor at the levels of either much or very much (see Figure 6-9). Similar to the
previous factor, the operational experience of controllers who rated this factor highest
is on average more than 13 years, they operate in ATC Centres with recovery
procedures (94.5 percent of controllers who rated situation-specific problem solving
much or very much) and recovery training schemes (63 percent of controllers who
rated situation-specific problem solving much or very much). The only difference
observed with the previous group of controllers is that no controllers from the African
region rated situation-specific problem solving highly. European controllers tend to rely
much more on situation-specific problem solving (69.3 percent of responses captured
from European controllers) compared to their reliance on written procedures (42.7
percent).
50
Frequency
40
30
35.65%
20
27.83%
24.35%
10
10.43%
1.74%
0
Very much
Much
Moderately
Minimal
Not at all
Situation-specific problem solving
Figure 6-9 Controllers reliance on situation-specific problem solving throughout the recovery
process
157
Chapter 6
Finally, 64.08 percent of controllers rated other factors (e.g. past experience) at the
level of either much or very much (see Figure 6-10). Similar to the previous factors,
the operational experience of controllers who rated this factor highest is on average
more than 13 years, they operate in ATC Centres with recovery procedures (90.8
percent of controllers who rated other factors much or very much) and recovery
training schemes (58.5 percent of controllers who rated other factors much or very
much). European controllers rely most on other factors (e.g. past experience) when
recovering from equipment failures (69.6 percent of responses captured from European
controllers) compared to Asian controllers (42.1 percent of responses captured from
Asian controllers). The sample of African controllers is too small for any comparison.
40
Frequency
30
20
33.01%
31.07%
29.13%
10
3.88%
2.91%
Minimal
Not at all
0
Very much
Much
Moderately
Past experience
Figure 6-10 Controllers reliance on other factors (e.g. past experience) throughout the recovery
process
Figures 6-8 to 6-10 and frequency analysis show that controllers mostly rely upon other
factors (e.g. past experience) when dealing with equipment failures. This is followed by
situation-specific problem solving and finally written procedures. After investigation of
factors that affect controller recovery, the next section focuses on the survey objective
and the assessment of the most unreliable ATC system/tool.
6.7.3.3 The most unreliable ATC systems/tools (Q3)
The data used for the analysis of the most unreliable ATC equipment are based on two
particular questions, 5 and 9. Question 5 consisted of examples of equipment failures
that severely impacted on the controllers work. Question 9 asked controllers to list the
three most unreliable ATC systems/subsystems they have experienced. The data
obtained from both questions were collated and pre-processed to remove any duplicate
158
Chapter 6
answers. This was necessary as controllers tended to give the similar response to both
questions.
The results of the analysis of questionnaire responses from 34 countries were found to
be similar to those obtained from the analysis of operational failure reports, presented
in Chapter 4. The questionnaire survey shows that the three most affected ATC
functionalities are: communication (37.2 percent of all examples provided), data
processing (24.6 percent), and surveillance (23 percent) (Figure 6-11). More precisely,
the following five equipment types are affected most:
air-ground communication (12.03 percent of all examples provided);
primary surveillance radar ( 9.1 percent);
flight data processing system (7.75 percent);
communication panel ( 7.49 percent); and
ground to ground communication (6.68 percent).
Figure 6-11 Distribution of affected ATC functionalities as reported in the questionnaire survey
Table 6-2 establishes the link between the most unreliable ATC functionalities and
existing recovery procedures, as reported by 134 controllers from 34 countries
representing various regions of the world. The link is established based on responses
to questions 5, 9, 10, and 11. In addition, the analysis was conducted at the country
level rather than ATC Centre level to avoid direct reference to sensitive information
specific to ATC Centres. It should be noted that because of this, inaccuracies are
possible only for the cases when the controllers did not have a full awareness of the
availability of recovery procedures in their ATC Centres.
159
Chapter 6
Table 6-2 Mapping between most unreliable ATC functionalities and existing recovery
procedures for the countries sampled
Country
Ireland
Most unreliable
ATC functionalities
Communication
Navigation
Surveillance
Data processing
Pointing/input
devices
Existing recovery procedure

Frequency failure, telephone failure
Failure of navigational aids
Radar failure (procedural/non-radar control)
Strip printer failure (emergency strip printing)
Input device failure
Power outages, procedures for all failure types
Finland
Serbia
Communication
Surveillance
Data processing
Communication
Surveillance
Data processing
Switzerland
Communication
Navigation
Surveillance
Data processing
Pointing/input
devices

Flight data processing system (FDPS) failure, radar
data processing system (RDPS) failure
Radar failure, visualisation system (radar display) failure
FDPS failure
Power supply failure

United
Kingdom
Surveillance
Communication
Surveillance
Netherlands
Data processing
Pointing/input
devices
Procedures for all failure types

Frequency failure
Secondary surveillance radar (SSR) failure, radar
fallback system failure, failure of the working position
(radar display)
FDPS failure, RDPS failure
Total system failure (in various gradations)

Germany
Spain
Communication
Surveillance
Data processing
Communication
Surveillance
Data processing
Communication
Norway
Italy
France
Surveillance
Data processing
Pointing/input
devices
Communication
Navigation
Surveillance
Data processing
Communication
Surveillance
Data processing
Radar failure
Total system failure
Frequency failure
Total radar failure
Fire contingencies
Frequency failure, on-line data interchange (OLDI) link
failure, communication panel failure, telephone failure,
headset failure, intercom failure
Radar failure, failure of the radar display
FDPS failure
Frequency failure
Runway/taxiway lights failure
Radar failure
Radar failure
Power outage, air conditioning failure, fire evacuation,
meteorological equipment failure, failure of navigation
160
Chapter 6
Sweden
Slovenia
Belgium
Macedonia
Croatia
Communication
Surveillance
Data processing
Pointing/input
devices
Safety nets
Communication
Data processing
Communication
Surveillance
Communication
Data processing
Pointing/input
devices
Communication
Surveillance
Data processing
Denmark
Communication
Surveillance
Data processing
Communication
Data processing
Communication
Portugal
South Africa
Tahiti
FDPS failure
Radar failure
Frequency failure, telephone failure, voice switching
and communication system (VSCS) failure
Radar failure, radar display failure
Strip printer failure
Communication
Radar failure
Frequency failure, telephone failure, FDPS failure,
power outage
Telephone failure, intercom failure
Failure of navigation equipment, instrument landing
system (ILS) failure
Radar failure
FDPS failure
Navigation
Singapore
Radar failure
Radar failure
Navigation
Surveillance
Data processing
Communication
Tanzania
India
Procedures for most failure types, runway/taxiway

lighting system failure, instrument landing system (ILS)
failure
Radar failure
Frequency failure
Radar failure, radar fallback failure
Frequency failure
Power outage
Radar failure
Moldova
Iceland
aids
Radar failures, surface movement radar failure
Surveillance
Data processing
Pointing/input
devices
Communication
Surveillance
Communication
Surveillance
Data processing
Safety nets
Frequency failure
Radar failures, failure of radar display
Frequency failure, failure of satellite communication
Navigational aids failure, tsunami alert, aircraft diverting

due to terrorist action
Australia
Austria
Communication
Surveillance
Surveillance
Data processing
FDPS failure , RDPS failure, failure of strip printer
161
Chapter 6
Pointing device failure, failure of touch input display

(TID), frequency failure
Romania
Communication
Surveillance
Procedures for all failure types
Malta
Macau Special
Administrative
Region
Kenya
New Zealand
China
Malaysia
Communication
Surveillance
Data processing
Pointing/input
devices
Power supply
Communication
Navigation
Data processing
Communication
Navigation
Surveillance
Data processing
Communication
Surveillance
Data processing
Safety nets
Surveillance
Communication
Surveillance
Data processing
Safety nets
Radar failure
Frequency failure
Navigation aids failure
Procedures for all failure types, radar failure, SSR
failure
Strip printer failure

Radar failure, radar screen failure
Partial and total failure of all ATC equipment,
evacuation of ATC centre, mouse/keyboard failure,
power outage
Radar failure
FDPS failure, frequency failure
Frequency failure
The instances in which identified failures are not supported by existing recovery
procedures are highlighted in grey. In these cases, controllers experienced ATC
equipment failures for which recovery procedures were not available in their ATC
Centre. On the other hand, the instances in which sampled controllers have not yet
experienced equipment failures, for which procedures exist, are highlighted in yellow
and separated as the last row for each country. As an example, if the communication
function was affected specifically by frequency failure, the mapping is not established
(coloured grey) if the recovery procedure did not exist for this particular failure type. In
several cases controllers reported that their ATC Centre has procedures for all failure
types. Clearly it is not possible to cover all failure types but to design generic
procedures or guidelines to perform in the case of equipment failure.
It can be concluded that inadequate mapping between recovery procedures and
equipment failures experienced by controllers occurred in many cases. The most
severe cases are those in which countries do provide at best only one type of recovery
162
Chapter 6
procedure. This was identified in several European countries (i.e. Finland, Macedonia,
Iceland, and Malta), in two African countries (i.e. South Africa and Kenya), and two
Asian/Pacific countries (i.e. Tahiti and Malaysia). The most neglected ATC functionality
was found to be data processing, followed by surveillance and communication. The
paradox is that the qualitative equipment failure impact assessment tool (Chapter 4)
identified exactly these three ATC functionalities as the most challenging to controller
recovery.
6.7.3.4 Organised exchange of information on equipment failures (Q4)
40.3 percent of the controllers surveyed reported that their ATC Centres have
organised exchange of information on equipment failures between colleagues. 49.3
percent reported a lack of this exchange of experience whilst 10.4 percent did not
answer this question.
Contradictory responses were obtained from 14 ATC Centres and are further
investigated by responses given to the subsequent question, i.e. whether the organised
exchange of experience is supported by management as a good working practice.
From the ATC Centres that have exchange of experience, 76 percent have formal
processes approved by management as opposed to the practice based on word of
mouth that reaches only a small portion of controllers. The question was intended to
capture initiatives by management to provide means to share experience on equipment
failures in an organised manner. This may be achieved using different methods, such
as seminars, company newsletters, safety bulletins, memorandums, and workshops. In
these ways the lessons learnt are disseminated not only between the controllers
directly experiencing the effects of the failure, but within the entire ATC Centre and
often within the same country.
Based on this additional assessment, the following countries do not have formal nor
informal processes for exchange of experience on equipment failures: Italy, Ireland,
Croatia, India, Slovenia, Maastricht ATC Centre (as opposed to Amsterdam Centre),
Switzerland, Slovenia, Macau SAR, and Kenya.
The data indicates that there is room for improvement. There is a clear need for the
implementation of formal processes for exchange of experience on equipment failures
including failure modes and recovery processes. This should form part of a wider safety
culture within ATC Centres which is the responsibility of management. The past has
proven this type of indirect training to have a beneficial safety impact in a similar way to
163
Chapter 6
regular recurrent training. The example discussed in Chapter 5 mentions an incident

where A300 was struck on the left wing by a surface to air missile system resulting in a
loss of all flight controls. Reacting rapidly, the captain recalled a television documentary
on a DC-10 crash at Sioux City (Iowa) and the thrust change technique employed by
the captain and crew of the DC-10 to control their aircraft. Although the A300 crew had
never practiced this technique before, they quickly gained control despite the extreme
stress of the situation (IFALPA, 2005).
6.7.3.5 Status and quality of recovery procedures (Q5)
A section of the questionnaire consisting of 11 questions (from 10th to 20th question)
was dedicated to the assessment of recovery procedures within each ATC Centre. The
first question was designed to immediately filter out those ATC Centres without any
written procedures in place. In this case, the controller would skip the rest of this
section and proceed with the rest of the questionnaire. In cases where recovery
procedures exist, the remaining ten questions were designed to assess the quality of
those procedures. These questions focused on the completeness of the recovery
procedure, the level of currency, clarity, realism or feasibility, accessibility, and
compatibility with other procedures. In addition, controllers were given the opportunity
to comment on any event for which there was an inadequate application of recovery
procedures in their working experience.
The analysis of the questionnaire responses highlighted some inconsistencies (marked
with ? in Table 6-3). In these cases, the controllers from the same ATC Centre gave
opposite responses to the questions on the existence of recovery procedures, recovery
training, and/or recurrent training. These are further investigated using the responses
to the subsequent questions related to recovery procedure (11th to 20th question),
recovery training (25th to 28th question), and recurrent training (23rd and 24th question).
In this section, further investigation regarding the existence of recovery procedures is
conducted for Shannon, Cork, Brussels, and Nairobi ATC Centres (Table 6-3) using the
answers provided from 11th to 20th question. Although controllers from these ATC
Centres reported a lack of recovery procedures in the 10th question, their subsequent
answers revealed that these procedures do exist (at least for some failure types).
164
Chapter 6
Table 6-3 Existence of recovery procedures, recovery training, and recurrent training as
reported in the questionnaire survey
Country
Ireland
Finland
Serbia
Switzerland
United
Kingdom
Netherlands
Germany
Spain
Norway
Italy
France
Sweden
Slovenia
Belgium
Macedonia
Croatia
Moldova
Iceland
Denmark
Portugal
South Africa
Tanzania
India
Singapore
Tahiti
Australia
Shannon
Dublin
Cork
Kemi
Belgrade
Zurich
Geneva
Existence of
recovery
procedure
?
Yes
?
No
Yes
Yes
Yes
Bristol
Yes
Yes
No
Maastricht
Nieuw Milligen
Amsterdam
Karlsruhe
Langen
Frankfurt
Seville
Olso
Kirkenes
Stavanger
Bodo
Rome
Bologna
Naples
Venice
Milan
Paris
Nice
Stockholm
Malmo
Gothenburg
Ljubljana
Brussels
Skopje
Split
Zagreb
Pula
Zadar
Chisinau
Reykjavik
Copenhagen
Lisbon
FAJS
Dar el Salaam
Mumbai
Kolkata
Singapore
Papeete
Melbourne
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
?
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
?
Yes
Yes
Yes
Yes
Yes
?
Yes
Yes
No
Yes
No
No
No
Yes
No
Yes
No
No
Yes
Yes
Yes
No
No
No
No
No
No
Yes
No
Yes
?
Yes
Yes
?
?
Yes
?
No
Yes
No
Yes
No
No
Yes
No
Yes
No
Yes
Yes
?
No
No
No
No
No
No
No
Yes
Yes
Yes
No
No
Yes
No
Missing data
Missing data
Yes
?
Yes
?
Yes
No
Yes
No
Yes
?
No
ATC Centre
165
Existence of training
for equipment failures
Existence of
recurrent training
Yes
No
?
Yes
No
Yes
Yes
?
?
?
Yes
No
?
?
Chapter 6
Austria
Romania
Malta
Macau SAR
Kenya
New
Zealand
China
Malaysia
Vienna
Bucharest
Malta
Loqa airport
Macau
Nairobi
Wellington
Auckland
Christchurch
Hong Kong
Subang
Yes
Yes
No
Yes
Yes
?
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
?
Yes
Yes
Yes
?
Yes
?
Yes
Yes
No
Yes
?
No
No
Yes
Yes
No
No
Table 6-2 shows that 93.1 percent of sampled ATC Centres do have some form of
recovery procedure in place (i.e. 54 ATC Centres). The types of equipment failures
mostly covered by recovery procedures in sampled ATC Centres are:
radar failure (reported by 40.2 percent of controllers surveyed);
failure
of
communication
function:
radio
telephony,
ground
to
ground
communication, voice switching and communication system panel (reported by

43.3 percent of controllers surveyed); and
flight data processing system failure (reported by 12.69 percent of controllers
surveyed)4.
74 percent of controllers reported that these recovery procedures are kept up-to-date
and reflect the changes in hardware and software occurring in the ATC Centre.
Similarly, 72 percent of controllers rated available recovery procedures as
comprehensive, while only 55 percent rated them as complete. The remaining 45
percent of controllers surveyed rated available recovery procedures as incomplete (i.e.
missing recovery steps necessary to re-establish a safe ATC service). When asked
which types of recovery procedures should be added, the controllers mostly
emphasised
the
requirement
for
recovery
procedures
from
radar
failure,
communication systems failure, the need for back-up systems, and procedures for
handling outages at ATC Centre level. Furthermore, 88 percent of controllers rate
available recovery procedures as clear and understandable, while 72 percent rated
them as realistic and feasible to perform.
69 percent of controllers surveyed reported that recovery procedures documentation is
easily accessible, i.e. they are placed in close proximity to controller working positions.
4
The discussion presented in Chapter 5 showed that ICAO provides recovery procedures for
the communication and surveillance functionalities but not for the data processing functionality.
166
Chapter 6
Finally, 77 percent of controllers reported that available recovery procedures are linked
or harmonised to other procedures specified within the Manual of Air Traffic Services
(MATS), e.g. on suite allocation of tasks (separation of responsibilities between
executive and planner controller), and duties of the staff such as the approach
controller, the ground controller, or the watch manager.
From the survey data and subsequent analyses, it can be concluded that majority of
sampled ATC Centres have some form of recovery procedures. The majority of
controllers reported that these procedures are up-to-date, comprehensive, easily
accessible, and compatible with other procedures. Moreover, controllers emphasise
the need for procedures on radar and communication failures.
6.7.3.5.1 Other findings regarding the recovery procedures
In addition to the findings in the previous section, the questionnaires narrative section
highlighted interesting safety-relevant issues regarding recovery procedures. These are
individual comments rather than findings representative of the entire sample. The
reported issues are categorised in three groups, namely equipment specific, teamwork
specific, and generic recovery related issues. These are discussed in the following
paragraphs.
The equipment related issues highlighted major problems with the flight data
processing system not covered in the operational manuals. In addition, controllers
reported a lack of back-up facilities. One example indicated that during radio
communication system failure, a particular ATC Centre had only ten emergency radio
devices for the operational room with a 20 seat configuration.
On teamwork related issues, the controllers mostly reported inadequate familiarisation
with contingency procedures on the part of technical staff and controllers in
neighbouring sectors. In general, the controllers highlighted the important role of
teamwork and the need for an experienced planning controller in the event of
equipment failure. Another example drew attention to the unavailability of technical staff
during night shifts to immediately provide assistance in the case of equipment failure.
In short, controllers feel that teamwork is important in dealing with failures and that
Team Resource Management (TRM) training, aimed at enhancing teamwork efficiency,
should be mandatory for all ATC Centres.
167
Chapter 6
Finally, many individual recovery related issues, such as context, procedures, and
working practice, are also highlighted in the questionnaires narrative part. These are
as follows:
Situation-specific problem solving plays a major role as all equipment failures
occur within a specific context (e.g. bad weather, frequency jamming, high/low
traffic levels);
There is a need for a similar approach to recovery procedures as are available to
pilots. In other words, a comprehensive manual with all possible failures and
corresponding recovery steps is needed during controller training. For the
operational environment, it would be necessary to design an abbreviated version
of the contingency manual available at each controller working position (e.g. aidememoire in the form of check-list, see Appendix III); and
Accurate and efficient strip marking is seen as the most reliable recovery tool in
the case of radar or flight data processing failure.
6.7.3.6 Status and quality of training for recovery (Q6)

A section of the questionnaire consisting of eight questions (from 21st to 28th question)
was dedicated to the assessment of training in recovery from equipment failures within
each ATC Centre. The first question was designed to immediately filter out those
Centres without training schemes. In this case, the controller would skip the reminder
of this section and proceed with the final part of the questionnaire. In the case of the
existence of a recovery training scheme, the remaining seven questions were designed
to assess its quality by extracting information on the existence of recurrent training, its
frequency, content, and compatibility with other types of training. The final section of
the questionnaire provided the opportunity for controllers to comment on other issues
of relevance to training.
The analysis of the collected data firstly revealed inconsistencies in the responses to
questions on training (Table 6-3). The reason for this may be that some controllers
assumed their initial training, e.g. initial radar control training, as training for recovery.
Other controllers may have considered only separate training for emergency situations
and whether it involved some type of equipment failure.
30 ATC Centres (51.7 percent) have training for recovery for equipment failures, 18
ATC Centres (31 percent) do not, while data for 10 ATC Centres (17.3 percent) are
inconsistent (i.e. marked with ? in Table 6-2). In these cases, the controllers from the
168
Chapter 6
same ATC Centre gave opposite responses to the questions on existence of recovery
training. All these inconsistencies are further investigated using the subsequent
questions related to recovery training (i.e. 25th to 28th question). Although controllers
from these ATC Centres reported contradictory responses on existence of the recovery
training (i.e. 21st question), their answers to subsequent training-related questions did
not reveal any further information. Therefore, a conservative approach has been taken
and these 10 ATC Centres are considered not to have recovery training in place.
In the case of recurrent training, the analysis shows that only 36.2 percent of the whole
sample of ATC Centres have recurrent training, 43 percent do not, while the rest of the
data is either inconsistent or missing. Recurrent training is provided once a year in 25
ATC Centres and bi-annually in three ATC Centres (Oslo-Norway, Bucharest-Romania,
Auckland-New Zealand). In addition, Geneva and Melbourne ATC Centres provide
recurrent training three times per year, while Frankfurt ATC Centre provides recurrent
training 20 times per year. In the latter a contingency system is used every weekend to
train controllers.
Further analysis of the ATC Centres with recurrent training frequency higher than once
a year, shows that all have recovery procedures in place, while the majority (i.e. 64
percent) have an organised exchange of information on equipment failures. The
Auckland ATC Centre emphasised that recovery performance was difficult before the
introduction of clear and easy to follow procedures. Moreover, this ATC Centre
highlighted that operations impact on recovery training as the recent failure types are
included in the recurrent training. Although the Oslo ATC Centre has recovery
procedures, its controllers report the need for more comprehensive and easily available
procedures (e.g. checklist type procedures on each console). These controllers
expressed a need to step away from increased dependency on experience when
handling equipment failures.
From the subset of controllers who have recurrent training once a year, 55 percent
believe that this is adequate, with the rest express the need for higher frequency in
order to build competency in handling unexpected equipment failures. When asked if
the training covers all important equipment failures, the majority of controllers (i.e. 63
percent) answered negatively. The most frequent issues mentioned to be added to the
current training syllabus are:
complete radar failure simulated in a comprehensive and realistic way;
total power failure;
169
Chapter 6
facility evacuation;
team resource management (TRM);
different types of aircraft problems (e.g. communication failure, engine failure,
landing gear problem);
hot standby procedures (system running in the background ready for immediate
use); and
radar bypass (radar information is presented directly at the radar display without
having been processed, resulting in the presentation of uncorrelated tracks only).
61 percent of controllers believe that the training methods utilised in their ATC Centres
are suitable, or more precisely, realistic and varied. Furthermore, according to the
responses from 63 percent of controllers surveyed the recovery training is compatible
(i.e. linked to other training schemes). In general, it is essential to harmonise recovery
training within the overall training syllabus. One option is to include recovery training
within each training course, such as ab-initio training, conversion course, continuity or
recurrent training, training for unusual situations, and TRM training. The other option is
to provide separate recovery training sessions on a regular basis. Regardless of the
approach, ATC management has to assure an inclusive, regular, and consistent
approach in training for recovery to its entire population of controllers.
From the survey data and subsequent analyses, it can be concluded that the majority
of the ATC Centres surveyed have some form of recovery training although not
necessarily provided consistently throughout the Centre. The situation with recurrent
training is worse as in the majority of cases, this type of training is not provided
regularly. This results in the extensive reliance on experience in dealing with equipment
failures which may pose a significant safety threat in ATC Centres with a large
percentage of newly established and thus less experienced controllers. In general, the
controllers surveyed want to step away from over reliance on experience and be
regularly trained as much as possible.
6.7.3.6.1 Other findings on training for recovery
In addition to the findings in the previous section, the questionnaires narrative section
highlighted interesting safety-relevant issues regarding recovery training. These are
individual comments rather than findings representative of the entire sample. The
reported issues focus on the quality and frequency of recovery training.
170
Chapter 6
According to the controllers surveyed the main problem is the overall lack of training,
for supervisors, engineers, and controllers. The controllers believe that a couple of
hours of training per year is far too little practice and some of them feel that recurrent
training is necessary at least twice a year. In the event of more critical equipment
failures (e.g. radar) with high traffic levels, there may be occasions that there is no time
to act upon the recovery procedures. On these occasions the role of training as well as
teamwork has a much greater importance.
The controllers are aware that it is almost impossible to include everything that can go
wrong within the training syllabus, but emphasise that more training and guidance
should be given. They also highlight that training sessions should be as realistic as
possible in the simulated environment (e.g. higher traffic levels and the need to use
radar fallback system regularly). Currently, in some ATC Centres, the training only
focuses on outages (i.e. failure of the entire ATC system) and not on everyday failures.
An example of an ATC Centre where recurrent training takes place only on a night shift
highlighted inconsistent provision of training throughout the ATC Centre, as only those
controllers on a night shift get recovery training.
6.7.3.7 Other findings on recovery performance
This section deals with additional findings extracted specifically from question 5. This
question aimed to provide an opportunity to controllers to discuss their past experience
with equipment failures which seriously impacted on their work. The findings extracted
from question 5 are presented in Appendix VI.
While section 6.7.3 has provided a high level analysis and results of the survey, the
following section carries a more rigorous analysis of the data.
6.7.4 Interaction analyses

The data analyses started with the assessment of the sample characteristics and
proceeded with the high-level summaries of controller responses. In this section, the
final set of data analyses investigates the relationships between the characteristics of
controllers (e.g. operational experience) and various recovery factors using appropriate
statistical tests. The section starts by the qualitative assessment of potential
interactions and identification of those relevant to controller recovery. This is followed
by the presentation of appropriate statistical tests and their key findings.
171
Chapter 6
Several reciprocal interactions amongst controller characteristics and recovery factors

(correspond to key question defined in section 6.1) are chosen for further statistical
testing and marked with symbol (Table 6-4). This choice is based on known
relationships from operational experience further tested using the rigorous statistical
assessment. The focus is placed on controller recovery and factors that influence it,
which corresponds to a total of eight interactions.
Existence of
recovery training
Existence of
recovery procedures
Formal exchange of
information
Factors that
influence recovery
performance
Experience with
equipment failures
Operational experience (length of service)

Rating
Experience with equipment failures (frequency
per year)
Factors that influence recovery performance
Formal (management supported) exchange of
information
Existence of recovery procedures
Existence of recovery training
Rating
Operational
experience
Table 6-4 Interaction matrix
The nature of the variables under consideration determined which statistical methods
could be used to analyse the data. As can be seen from their description in this
Chapter, three variables are categorical (rating, factors that influence recovery
performance, formal or management supported exchange of information on equipment
failures) whilst two represent a continuous or ratio scale variable5 (operational
experience-length of service, experience with equipment failures-frequency per year).
As data differ significantly from the normal distribution, several non-parametric tests
with 95 percent significance level have been used. As previously explained in Chapter
4 (section 4.4.1), chi-square tests are used to test the relationships between two
categorical variables. Furthermore, the Cramers V test is used to measure the
5
As mentioned in Chapter 4, variables can be either continuous or categorical. Continuous

variables are numeric values on an interval or ratio scale (e.g. age, income). Categorical
variables can be either nominal or ordinal. Nominal variables differentiate between categories
but do not assume any ranking between them (e.g. gender). On the other hand, ordinal
variables differentiate between categories that can be rank-ordered (e.g. from lowest to
highest).
172
Chapter 6
association for nominal data (i.e. interactions between factors that influence recovery
performance with rating and existence of formal exchange of information on
equipment failures) whilst the Kendall tau test is used for ordinal data (i.e. factors that
influence recovery performance). The relationship between two ratio variables is tested
via non-parametric correlation or Kendalls tau statistics which uses the ranks of the
data to calculate correlation coefficient. Correlation coefficient ranges between -1 and
1, where its sign indicates the direction of the relationship (either positive or negative)
whilst its absolute value indicates the strength of the relationship.
Finally, the relationship between ratio and categorical variable is tested using the nonparametric Mann-Whitney test. The test is used to assess whether two samples of
observations come from the same distribution (Shier, 2004). The test involves the
calculation of a statistic, referred to as U (see equation 6-1).
U = n1n 2 +
n1(n1 + 1)
R1,
2
6-1
where n1 and n2 are the two sample sizes, and R1 is the sum of the ranks all the
observations in sample 1. Samples greater than 20 are assumed to follow normal
distribution, thus U statistic is converted to a Z score using the formula in equation 6-2
(Shier, 2004):
n1n 2
2
n1n 2 (n1 + n 2 + 1)
12
largest U value
Z=
6-2
The results of all tests are presented in Table 6-5.
Table 6-5 Statistical tests and results obtained

Variable 1
Variable 2
Test
ACC
Operational experience
(length of service)
Mann-Whitney
non parametric
test
APP
TWR
(length of service)
Experience with
equipment failures
(frequency per year)
Written procedures
Situation-specific
problem solving
Other
173
Non-parametric
test (Kendalls
tau)
Mann-Whitney
non parametric
test
Statistical
significance at 95
percent confidence
level
p>0.05
p<0.001 (U=1382.5,
z=-3.56)
p=0.014 (U=3387.5,
z=-2.46)
p>0.05
p>0.05
p>0.05
p>0.05
Chapter 6
Rating
ACC
APP
TWR
ACC
APP
TWR
Experience with equipment

failures (frequency per
year)
Factors that
influence
recovery
performance
Factors that
influence
recovery
performance
Written
procedures
Situationspecific
problem
solving
Written
procedures
Situationspecific
problem
solving
Other
Number of equipment
failures experienced
annually (Q4)
as above
Factors that influence

recovery
performance
Non-parametric
test (Cramer's V)
Written procedures
Situation-specific
problem solving
Other
Situation-specific
problem solving
Other
Other
Mann-Whitney
non parametric
test
p>0.05
p>0.05
p>0.05
p=0.0086
p>0.05
p>0.05
p>0.05
p>0.05
p>0.05
p>0.05
Non-parametric
test (Kendalls
tau)
p>0.05
p<0.001
p>0.05
Formal exchange of
information (Q7)
Non-parametric
test
(Cramer's V)
p>0.05
p=0.029
Statistical tests performed indicated five significant relationships (Table 6-5). Significant
relationships are found between controllers with APP rating and TWR rating and years
of operational experience (i.e. years in service). In the sample surveyed, controllers
with APP rating have more operational experience compared to those without this
rating. Similarly, controllers with TWR rating have more operational experience
compared to those without it. Secondly, a significant relationship is identified between
other factors that influence recovery performance and ACC rating. Data indicates that
controllers with ACC rating tend to rely upon other factors (e.g. past experience) more
than those without ACC rating. This is expected as controllers with ACC rating in the
available sample have more operational experience than those without ACC rating.
Thirdly, a significant relationship is identified between controller reliance on situationspecific problem solving and other factors (e.g. past experience) when recovering from
equipment failures. This is expected as past experience represents one of the factors
that define the situation surrounding (context) of an equipment failure. Finally, a
significant relationship is identified between controller reliance on other factors (e.g.
past experience) when recovering from equipment failures and management supported
Relationship between other factor that influence recovery procedure and ACC rating.
174
Chapter 6
exchange of information regarding equipment failures (Table 6-5). It may be the case
that controllers account for exchange of information regarding equipment failures as a
type of past experience.
On the other hand, no relationship is identified between the factors that influence
recovery process and operational experience (i.e. number of years active as a
controller). Although it was expected that less experienced controllers may rely more
on written procedures and that more experienced controllers may rely more on past
experience, statistical testing did not support these expectations. Years in service do
not differentiate between reliance upon a written procedure, context, or other factors
(e.g. past experience). It may be the case that the overall safety culture built in the ATC
Centre determines what a controller may use as the main resource in recovering from
equipment failures. Therefore, if the procedures are not available, they will rely more on
situation-specific problem solving. Therefore, this decision would be based on
organisational issues more than their own experience.
6.8 Summary
This Chapter has discussed in detail the questionnaire survey that sampled 134
controllers in 58 ATC Centres from 34 countries. The survey was designed to achieve
four main objectives. Firstly, to build on the literature review to further investigate
equipment failures and factors that influence controller recovery by introducing
operational experience. Secondly, to support the information obtained from operational
failure reports (as represented in Chapter 4), which lacked the input on controller
recovery. Thirdly, to assess the status and quality of recovery procedures and training
in the sampled set of ATC Centres. Finally, to contribute to the wider human reliability
research with a particular focus on controller recovery from equipment failures.
The results of the analyses conducted on the data consist of several interesting
findings. These are structured around six key questions that this survey addresses.
How often do controllers experience equipment failures (Q1)?

Almost 95 percent of controllers surveyed experienced ATC equipment failure in their
operational career. The investigation of frequency of failures per year revealed that
major failures tend to occur only once a year or once in two years, while less severe
failures tend to occur with a relatively high frequency. These findings are in line with the
results obtained from operational failure reports and their categorisation based on
severity (presented in Chapter 4).
175
Chapter 6
What factors influence their recovery performance (Q2)?

Investigation of the factors that mostly influence controllers recovery performance
has revealed that factors other than written procedures and situation-specific problem
solving have the greatest impact, e.g. past experience. However, differences
between these other factors (e.g. past experience) compared to written procedures
and situation-specific problem solving are not large, i.e. the controllers rated the
importance of all listed factors similarly.
What is the most unreliable ATC equipment (Q3)?

Investigation of the most unreliable ATC equipment, based upon the experiences of the
controllers surveyed, has shown a match with the results obtained from the analyses of
operational failure reports (as presented in Chapter 4). The most affected ATC
functionalities are the communication, surveillance, and data processing. The most
unreliable ATC equipment incorporates air-ground and ground-ground communication,
radar coverage, and the flight data processing system. These findings, together with
those from Chapter 4, led to the selection of the equipment failure to be simulated in
the experiment presented in Chapter 9 (i.e. the flight data processing system failure).
Is there any organised exchange of information on equipment failures and/or other

types of unusual/emergency situations (Q4)?
The organised exchange of information of equipment failure represents an indirect
experience and a learning opportunity. Through presentation, seminars, and safety
bulletins, the controllers could be presented with failure types, contextual conditions
surrounding the failure, and the difficulties experienced by their fellow colleagues in
handling the situation. However, in the sample obtained almost half of the controllers
did not have this kind of information exchange organised in their ATC Centres.
Do recovery procedures exist (Q5)?

Assessment of the existence and quality of recovery procedures shows that the
majority of sampled ATC Centres have some type of recovery procedure in place,
mostly for radar failure, communication failure, and flight data processing system
failure. The analyses also show that most of these procedures are kept up-to-date but
not always complete. Therefore, additional emphasis should be placed on the revision
of existing procedures to assure that the recovery steps presented are complete and
that these follow a logical order. However, attention should be paid to the trade-off
between the thoroughness of the procedure and limited time available to perform all
176
Chapter 6
prescribed steps and thus to recover. An example of a concise check-list type recovery
procedures developed in this thesis for a specific European ATC Centre is presented in
Appendix III. It is based on a format used previously by the German air traffic service
provider (DFS) accepted and published by EUROCONTROL (2003f).
What do controllers feel about the quality of training currently available for recovery
from equipment failures (Q6)?
Assessment of the existence and quality of training for recovery shows that only half of
the ATC Centres surveyed have established training for recovery from equipment
failures. The situation with recurrent training is even worse as only 36 percent of ATC
Centres surveyed organise regular recurrent training. In most cases, recurrent training
is provided only once a year, while in nine ATC Centres it is provided twice a year. On
the other hand, controllers support the idea of very frequent recurrent training. Almost
half of the respondents (i.e. 45 percent) feel an annual training session for a couple of
hours is simply not enough to keep them proficient and ready to deal with unexpected
equipment failures.
The process of identification of factors that affect controller recovery started in the
previous Chapter by an overall assessment of past research relevant to controller
recovery. It has continued in this Chapter by expanding these findings with the
questionnaire survey results and operational experience of controllers worldwide.
Based on these findings, the next Chapter finalises this rigorous process by identifying
factors that affect controller recovery, referred to as Recovery Influencing Factors
(RIFs).
177
Chapter 7
Methodology for a Selection of Relevant RIFs
Methodology for a Selection of Relevant Air Traffic

Controller Recovery Influencing Factors
This Chapter builds on the findings from past research of relevance to controller
recovery (Chapter 5) further augmented by the operational experience extracted from
the questionnaire survey (Chapter 6) to realise a detailed understanding of the context
that surrounds a controller during the occurrence of an unexpected equipment failure.
The Chapter starts by illustrating the importance of the impact that contextual factors
have on controller recovery from equipment failures in Air Traffic Control (ATC). It
reviews both Air Traffic Management (ATM) and non-ATM related Human Reliability
Assessment (HRA) techniques to assure a comprehensive investigation of contextual
factors relevant to controller recovery from equipment failures in ATC. This initial
selection is augmented by the findings from the equipment reliability literature,
operational failure reports, human reliability research, and interviews with ATM
specialists. The Chapter concludes by identifying a set of relevant contextual factors,
referred to as Recovery Influencing Factors (RIFs), and their qualitative descriptors or
the levels of their influence on controller recovery performance.
7.1 Relevance of the recovery context

Analyses of accident investigations in various industries (e.g. aviation, nuclear and
chemical) have revealed that it is not possible to gain a full understanding of the
cause(s) of an accident from factual data alone. For example, the US National
Transportation Safety Board (NTSB) conducted dozens of detailed accident
investigations in which the teams of experts managed to assess different contributory
factors and identified various issues with task design, procedures, cultural issues
(mostly relevant to language barriers within pilot-controller communication), personal
factors (e.g. a shift in attention in L-1011 1972 accident in Everglades; NTSB, 1973),
weather (e.g. the Pan Am Flight 759 accident was due to thunderstorm and wind shear;
NTSB, 1983). Such factors can help explain why errors occur. Additionally, the
description of the context may also serve as a basis for defining ways of preventing or
178
Chapter 7
reducing specific types of erroneous actions by means of technical recovery (i.e. builtin defences) and human recovery.
It is also necessary to take into consideration contextual factors that traditionally may
not be recorded by investigating bodies, but which can have a significant impact on the
outcome of an accident. In support of this, Dekker et al. (2004) note that it is
necessary to capture both a situation in which the action takes place and the action
itself. Similar arguments were presented by researchers at the National Aeronautics
and Space Administration (NASA) Ames Research Centre, who pointed out that "we
must move beyond trying to pin the blame for accidents on a culprit but seek instead to
understand the systemic causes underlying the outcomes" (cited in Cox, 2005). The
research presented in this thesis expands the analysis of equipment-related incidents
to include the context in which controller recovery unfolds. Therefore, the objective of
this Chapter is to determine the relevant contextual factors that affect the process of
controller recovery from equipment failures in ATC.
In Air Traffic Management (ATM), the contextual factors relevant to controllers are
defined as internal or external factors which influence the controllers performance of
ATM tasks (EUROCONTROL, 2002b). It is notable that this definition is generic and
thus does not give an indication as to when it is appropriate to stop looking further for
contextual factors. The so-called stopping rule is taken to be directly linked to the
overall investigation process, where assessment of contextual factors represents only
one segment of that process. In other words, it is the role of the investigator to
determine the chain of events that constitute a safety-relevant occurrence. In this
respect, the analysis of contextual factors should cover the entire chain and assess the
relevant context for each link in the chain. The research presented in this thesis adapts
the EUROCONTROL definition of contextual factors. Hence, the contextual factors in
this research or Recovery Influencing Factors (RIFs) are defined as internal or
external factors that influence the controllers recovery from unexpected equipment
failures in ATC.
The factors extracted from the various techniques are known in the HRA literature as
Contextual Conditions CCs (EUROCONTROL, 2002b), Performance Shaping
Factors - PSFs (Shorrock, 1992; Shorrock and Kirwan, 2002; EUROCONTROL, 2004e;
THEMES, 2001; Swain and Guttman, 1983), Error Producing Conditions EPC
(EUROCONTROL, 2004d; Williams, 1986), Common Performance Modes CPMs
179
Chapter 7
(Hollnagel, 1993), Common Performance Conditions CPCs (Hollnagel, 1998), or

Recovery Influencing Factors RIFs (Kanse and van der Schaaf, 2000).
However, not all contextual factors are appropriate to describe the context around
recovery from equipment failures. This is because, firstly many factors have been listed
and recognised as generic factors without a good understanding of their influence
specifically on the recovery process. Secondly, many of the existing contextual factors
are derived from the nuclear and process industries. Such factors are not always
transferable to the highly dynamic and time-dependant ATC environment. Thirdly,
some of the past research was based on the models of human performance not
representative of specific ATC tasks.
It should be noted that the research presented in this thesis does not rely exclusively
on any particular model of human information processing. Instead, it simply assesses
the importance of the recovery context and aims to derive a set of contextual factors
that best determines the controller recovery performance. The following section
presents two equipment failure incidents to highlight the importance of the context in
which controller recovery takes place.
7.1.1 Examples of the recovery context

Two real examples taken from an incident database of a Civil Aviation Authority (CAA)
are presented below to illustrate the relationship between failure, recovery, and
contextual factors. Because of their confidential nature, the examples are de-identified.
Although brief in the description of equipment failure, the two reports identified various
contextual factors and their impact on controller performance.
The first report contained the following: At 2230 advice was received that there would
be a load test performed on the electrical system which would involve changing from
mains power supply to generators. Assurance was received that there would be no risk
of service interruption. Shortly after the power changeover two XX consoles crashed
followed by the remaining two. The Voice Switching Communication System (VSCS)
also failed as did the wall clock adjacent to the XX area. At the same time the simulator
also failed. It was subsequently established that the root cause of the reported failure
had been within the ATC organisation which did not set up appropriate maintenance
procedures on the live ATC system (i.e. organisational factor). Additionally, this report
highlighted the relevance of other contextual factors such as: the number of
workstations/sectors affected (i.e. loss of four workstations and the simulation platform),
180
Chapter 7
time course of failure development (i.e. sudden failure), and complexity of failure type
(i.e. multiple failure: several workstations, clock, and simulation platform affected).
The second report contained the following: The loss of radar display and VSCS at a
time of moderate traffic (approximately 10 aircraft on frequency) created substantial
workload on the controller. Thankfully, there were two controllers in the near vicinity
who were able to assist with a transition to a nearby controller working position and to
help maintain situational awareness and communications with the various aircraft via
air-ground (AG) bypass. This report highlighted the impact of traffic complexity at the
moment of failure occurrence (i.e. ten aircraft in simultaneous communication with the
controller), personal factors (i.e. substantial workload), communication for recovery
within a team (i.e. assistance with handling the traffic and maintaining traffic awareness
in spite of the loss of all critical systems: visual representation of traffic on display and
direct communication with relevant aircraft), adequacy of organisation (i.e. availability
of additional support), number of workstations affected (i.e. one workstation), and
complexity of failure type (i.e. multiple systems affected: radar display and
communication system).
The two brief cases above taken from an incident database illustrate the important
relationship between failure, recovery, and relevant contextual factors. In other words,
these equipment failure examples have shown that the context in which human
performance takes place is important in understanding human reliability. Although the
examples do not convey the complete picture of the occurrence of equipment failure
(e.g. no mention of any personal issues in the first example, weather), several
contextual factors have been captured. As a result, research on controller recovery
from equipment failures in ATC requires a precise definition of the context surrounding
any failure type. In order to achieve this objective, it is necessary to review the specific
contextual factors defined in various HRA techniques. This is used together with
information from equipment reliability literature to identify the Recovery Influencing
Factors (RIFs).
7.2 Methodology to extract the candidate set of contextual

factors
In order to determine a candidate set of contextual factors relevant to controller
recovery from ATC equipment failures, it is necessary to start with a review of
contextual factors as identified in the most relevant current HRA techniques (i.e. ATMspecific HRA techniques). It is important to highlight that this overview is not focused
181
Chapter 7
on human error per se or the underlying human information processing theory. The
literature on human error has been used simply to investigate the relevant factors that
influence the human performance in unusual/unexpected events (i.e. contextual
factors). As a result, human information processing theories used in assessed HRA
techniques are outside the scope of this thesis.
It is also important to note that although there are currently three HRA techniques used
in the ATM sector, the review presented here has also considered other HRA
approaches employed in other domains to assure a complete set of RIFs. Furthermore,
a review of relevant equipment-failure characteristics and dynamic situational factors
has been conducted in order to augment the results from the review of the HRA
techniques. This is to ensure a complete and reliable determination of the RIFs. The
RIFs are then verified by interviews with ATM specialists. Figure 7-1 presents the
methodology used in this thesis to extract a candidate set of contextual factors relevant
to controller recovery from ATC equipment failures.
Methodology to extract a
candidate set of Recovery
Influencing Factors (RIFs)
ATM related
HRA techniques
Output
Identified gaps
Augmentation with
findings from other
HRA techniques
Output
Identified gaps
Augmentation with
equipment-failure
related
characteristics
Output
Identified gaps
Augmentation with
dynamic situational
factors
Output
Figure 7-1 Methodology to extract a candidate set of RIFs
182
Verification of
selected RIFs by
two ATM Specialists
Chapter 7
7.2.1 Human reliability assessment techniques

The methodology for the selection of contextual factors relevant to controller recovery
starts with a review of contextual factors as identified in the most relevant current HRA
techniques.
7.2.1.1 Human Error in ATM (HERA)
The HERA project represents the most recent approach for the analysis of human error
in the ATM domain. It evolved because of European and US initiatives1 to produce a
distinctive HRA tool. HERA is based on an extensive literature review and the
operational involvement of air traffic controllers, incident investigators, and safety
managers. The HERA project developed an initial set of CCs for ATM based on the UK
incident reports, discussions with controllers, and vast literature on human factors
(EUROCONTROL,
2002b;
EUROCONTROL,
2003d; EUROCONTROL, 2003e;
EUROCONTROL, 2004d). HERA uses eleven groups of Contextual Conditions (CCs)

to define context: pilot-controller communications, pilot actions, traffic & airspace,
weather, documentation & procedures, training & experience, workplace design & HMI,
environment, personal factors, team factors, and organisational factors. Each of the CC
groups is further sub-divided, resulting in more than 200 contextual factors. HERA
recommends that CCs should be applied individually to each error that occurred during
an incident, rather than just once for the entire incident. This supports the concept
presented in this thesis that analysis of contextual factors should cover the entire chain
of events leading to an incident. Thus it should assess contextual factors relevant for
each link in that chain (see section 7.1).
The majority of contextual factors defined in HERA are relevant to controller recovery
from equipment failures in ATC. Thus, the HERA technique represents a good starting
point for compiling a list of RIFs. For example, severe weather conditions can degrade
controller performance by adding additional workload to the already complex recovery
task. As such weather should be incorporated in the list of RIFs.
There are also some factors defined in HERA that are not applicable to the recovery
from equipment failure in ATC. For example, pilot actions are relevant to ATM but not
ATC. Therefore, this particular factor will be excluded in the final choice of RIFs.
The US Federal Aviation Administration (FAA) developed the Human Factors Analysis and
Classification System (HFACS) tool.
183
Chapter 7
Additionally, pilot-controller communication is not relevant in the immediate event of

equipment failure. Although not addressed in this thesis, there are circumstances when
pilot actions are of importance, such as in the case of a major failure or when
unplanned or erroneous pilot actions result in the increase of controller workload. More
important than the example above is the communication between a team of controllers
for efficient recovery. In this respect, communication (for recovery) and team factors
could be combined to create one factor since the entire team interaction takes place
through the communication for recovery. Only in the event of severe equipment failure
(i.e. a failure that adversely affects the availability of an Air Traffic Service-ATS over a
significant period), is a controller obliged to inform all traffic (i.e. pilots) in the affected
airspace of a reduced level of ATS. Finally, there is a tendency to exclude
environmental issues, when looking at more specific events, such as equipment failure,
on the basis that controllers are familiar with working in a specific ATC Centre. This is
discussed further in section 7.2.1.3.
7.2.1.2 Technique for the Retrospective and Predictive Analysis of Cognitive
Errors in ATC (TRACEr)
This approach was developed by the UK National Air Traffic Services (NATS) to gain a
better understanding of controller error. It is a model-based approach, which performs
both a retrospective and a prospective analysis. The original version of TRACEr
contains eight different taxonomies; one of which describes context (Shorrock, 1992;
Shorrock and Kirwan, 2002). The CC groups derived in HERA were based largely on
the context defined in TRACEr. The TRACEr technique uses the Performance Shaping
Factors (PSF) taxonomy and classifies factors that have influenced or could influence
controller performance, aggravating the occurrence of errors, or perhaps assisting error
recovery (Shorrock and Kirwan, 2002). Thus, it can be concluded that TRACEr defines
context in a similar way to HERA, i.e. by defining relevant groups of PSFs. As with
HERA, each PSF group is further sub-divided, resulting in approximately 60 PSFs in
the TRACEr Light version. The PSF groups recognised by TRACEr are: traffic and
airspace (e.g. traffic complexity), pilot/controller communications (e.g. RT workload),
procedures (e.g. accuracy), training and experience (e.g. task familiarity), workplace
design, HMI and equipment factors (e.g. radar display), ambient environment (e.g.
noise), personal factors (e.g. alertness/fatigue), social and team factors (e.g.
handover/takeover), and organisational factors (e.g. conditions of work).
184
Chapter 7
The main difference between TRACEr and HERA is that the former does not include
pilot actions and weather (see Appendix VII). Thus, no additional candidate factors
could be extracted from TRACEr.
7.2.1.3 Recovery from Automation Failure (RAFT) Tool
As previously discussed in Chapter 5, this tool has been developed as a part of the
Solutions for the Human-Automation Partnerships in European ATM (SHAPE) project,
managed by the Human Factors Division of EUROCONTROL. The SHAPE project
defines context as any aspect of the operating environment that can influence a failure
or recovery process (EUROCONTROL, 2004e). The project focused on the contextual
factors affecting recovery, which is in line with the objective of this thesis. The relevant
contextual factors or PSF categories recognised in RAFT are: task load and system
complexity, pilot-controller communication, procedures and documentation, training
and experience, human-machine interaction, personal factors, social and team factors,
logistical factors, and other organisational factors.
A review of the RAFT PSFs shows that task load and system complexity represents a
workload facing the controller as a result of task performance and overall system
complexity. Therefore, this factor has a potential to be included as a RIF. Compared to
HERA, RAFT disregards pilot action, weather, and environment as relevant
contextual factors for human recovery from equipment failure in ATC. Whilst pilot
actions do not have much impact as explained in section 7.2.1.1, weather can bring
additional complexity to the occurrence of equipment failure. At the same time, RAFT
includes a new category called logistical factors, which includes maintenance and
staffing issues.
Environmental issues (e.g. noise, temperature, and lighting) are excluded. The reason
for this is that controllers are used to ambient characteristics by working in a specific
ATC Centre. On the other hand, logistical factors will be assigned to the existing
organisational factors category. The reason for this lies in the fact that staffing and
maintenance issues should be anticipated and pre-planned at organisational or
managerial level (e.g. maintenance scheduling, availability, and assignment of
personnel, stock of equipment and spare parts, on-the-job training aids). The
management in any ATC Centre should anticipate as far as possible unscheduled
technical disturbances and provide necessary defences for their prevention.
185
Chapter 7
The three techniques (HERA, TRACEr, and SHAPE/RAFT tool) above were developed
specifically for the ATC/ATM environment. In general, they defined context and
contextual factors in a similar way as it is defined in this thesis. The assessment of
these three models identifies a total of nine candidate RIFs. These are: communication,
traffic and airspace, weather, procedures, training and experience, HMI, personal,
organisational factors, and task complexity.
Whilst the review of ATM related HRA techniques gives many relevant contextual
factors, it worth examining relevant non-ATM HRA techniques to investigate if other
factors exist. The following sections provide an insight into the relevant findings.
7.2.1.4 Recovery from failures: understanding the positive role of human
operators during incidents
This research attempted to emphasise the positive role of human operators in the
overall system performance. In addition, it proposed a preliminary failure compensation
process model (or recovery model) derived initially for the chemical process industry.
Furthermore, the importance of a taxonomy used to describe the factors influencing
recovery was recognised. Based on the experience gained from field studies and the
relevant literature, Kanse and van der Schaaf (2000) developed a list of RIFs. In their
research the recovery factors were defined as factors that contribute to human
recovery performance once an error or failure has occurred. This definition
corresponds to the definition of RIFs adopted in this thesis. A categorisation into six
groups of RIFs adopted by Kanse and van der Schaaf (2000) from the power plant
industry is presented in Table 7-1.
Table 7-1 Factors influencing recovery from failures (from Kanse and van der Schaaf, 2000)
Categories of factors
Prioritisation of
recovery-related
tasks
Occurrence-related
Human (person)
related
Recovery Influencing Factors

Time available for recovery task, considering other tasks requiring attention
Urgency of recovery (amount of time until negative consequence arise)
Importance of or need for recovery (seriousness of possible consequences if
not recovered)
Type(s) of preceding failures
Performance phase in which the immediate result of the failure process is
detected (during the planning phase/ while carrying out the action/when the
outcome of the action is observable)
Available and applicable barriers/defences
Overall work area knowledge
Work area and process related skills
General competency in job
Time elapsed since last (re)training in work area
Time since last (re)training with regard to specific problem occurrence
Suspicion/distrust/intuition
186
Chapter 7
Social
Organisational
Technical/workplace/
situational
Personal attitude toward failure and failure compensation

System failure coping strategies
Self-efficacy (trust in own ability), self esteem
Fatigue; Shift work coping ability
Feeling of personal responsibility for the failure or problem
Feeling of personal responsibility with regard to recovery
Pride regarding job well done
Previous experience with failures (any type)
Previous experience with this failure (any type)
Team attitude toward failures and failure compensation
Attitude toward teamwork; Team efficacy
Feeling of team responsibility for the failure or problem
Feeling of team responsibility with regard to recovery
Availability of team members/colleagues
Organisation of work and responsibilities
Training plan; Competency assessment plan
Supervision; Personnel selection processes
Availability, quality and usability of procedures/instructions
Shift patterns and personnel planning
Organisational policy
Management attitudes towards failures & failure compensation
Availability of equipment/materials needed
Operator-process interface properties
The majority of the identified factors are relevant to equipment failures in ATC and
should be considered as potential RIFs. For example, available and applicable
barriers/defences are important with respect to detection, diagnosis, and correction of
equipment failure. Time pressure is recognised under the prioritisation of recoveryrelated tasks. Equipment failures in ATC are unexpected events, which degrade the
ATC service offered. In this case controllers are still required to provide a service to
ensure a safe flow of traffic. As a result, controller workload increases rapidly
potentially compromising controller performance. Therefore, this factor should be
analysed for potential inclusion into the RIFs. Occurrence-related factors are mostly
applicable to the power plant environment and as such could not be directly applied to
ATC. However, if transferred to the characteristics of the ATC environment, these
factors may be relevant to equipment failure occurrence.
7.2.1.5 Computerised Operator Reliability and Error Database (CORE-DATA)
The CORE-DATA database was developed at the University of Birmingham to assist
the UK personnel involved in the assessment of hazardous systems such as nuclear,
chemical,
and
offshore
systems
(Kirwan,
Basra,
and
Taylor-Adam,
1997;
EUROCONTROL, 2002b; EUROCONTROL, 2004d). It represents an attempt to

develop a systematic approach to recording human errors. Several sources of data are
used to populate the database including: real operating experience (incident and
accident reports), simulation (both training and experimental simulators), experiments
(from literature on performance), expert judgment (e.g. as used in risk assessments),
187
Chapter 7
and synthetic data (from human reliability quantification techniques). According to

EUROCONTROL (2002b), CORE-DATA contains approximately four hundred data
records describing particular errors that have occurred, together with their causes, error
mechanisms, and their probabilities of occurrence. PSFs are defined in CORE-DATA
as underlying causes which influence human performance and indicate how the human
error occurred. CORE-DATAs PSF taxonomy consists of alarms, communication,
ergonomic design, ambiguous HMI, HMI feedback, labels, lack of supervision/checks,
procedures, refresher training, stress, task complexity, task criticality, task novelty, time
pressure, training, and workload.
There are a number of factors here of potential relevance to ATC and controller
recovery. Firstly, alarms should be considered as a particular type of technical built-in
defence (discussed in Chapter 4) and are therefore, important with respect to detection,
diagnosis, and correction of equipment failure. This is also in accordance with the work
done by Kanse and van der Schaaf (2000) as explained in the previous section. Hence
alarm should be considered as a potential RIF. Secondly, task novelty or task
familiarity in the case of equipment failures in ATC should be considered under the
training and experience RIF. Thirdly, time pressure has also been recognised in the
work done by Kanse and van der Schaaf (2000) under the prioritisation of recoveryrelated tasks. Therefore, this factor should be analysed for inclusion into the RIFs.
7.2.1.6 Technique for Human Error Rate Prediction (THERP)
The THERP technique was developed by Alan Swain at Sandia National Laboratories
in the 1950's (Swain and Guttman, 1983; Straeter, 2000). The THERP technique
assumes that human information processing can be influenced by error conditions
(Performance Shaping Factors-PSFs). THERP subdivides all PSFs into internal,
external, and those that act as physiological and psychological stressors. However, the
ways in which PSFs act on human performance are not explicitly specified.
Furthermore, THERP sub-divides external PSFs into situational factors, task factors,
and task instructions. Internal factors are defined as factors related to the organism (i.e.
human factors). The PSFs recognised in THERP are presented in Table 7-2.
Table 7-2 Factors influencing human actions in THERP (cited in Straeter, 2000)
Category
Factors influencing human actions
External Performance Shaping Factors
Situational factors
Design features; Quality of environment; Temperature, air humidity, air

quality, radiation exposure, illumination, noise, vibration, cleanliness;
Working hours; Breaks; Availability of special work resources; Job
manning; Organisational structure (authority, responsibility, channels
188
Chapter 7
of communication); Actions by shift leader, worker, manager,

supervisory authority); Remuneration structure (recognition, payment)
Factors in tasks and

work resources
Requirements for perception; Requirements for motor system (speed,

power expenditure, accuracy); Relationship between operators and
display; Requirements for adaptation; Interpretation; Decision making;
Complexity (information loading); Narrow nature of task; Short term
and long term memory; Calculations; Feedback (knowledge regarding
results of an action); Dynamic of gradual actions; Group structure and
communications; Man-machine factors; Interface (design of work
resources, test instruments, maintenance equipment, work aids, tools,
accessories)
Work and task

instructions
Required procedures (written, non-written); Written and verbal

communication; Warnings and danger signs; Work-methods; Plant
policy
Stressors
Psychological
stressors
Suddenness of occurrence; Duration of stress; Task speed; Task load;

High hazard risks; Threats (fear of failure, loss of job); Monotony,
degrading or meaningless activities); Duration of uneventful periods of
alertness; Work performance motive conflicts; Reinforcement of
missing or negative sensory deprivation; Detractors (noise, blinding,
motion, flickering, coloration); Inconsistent labelling
Physiological stressors
Duration of stress; Fatigue; Pain or discomfort; Hunger or thirst;

Extreme temperatures; Radiation; Extreme gravitational forces ;
Extreme pressure conditions ; Inadequate oxygen supply; Vibration;
Restricted movements; Absence of physical exercise; Interruption of
circadian rhythm
Internal Performance Shaping Factors
Factors relating to the

organism (i.e. human
factors)
Prior training, experience; State of momentary practice or abilities;

Personality and intelligence variables; Motivation and attitudes;
Emotional states; Stress (mental or physical); Knowledge about
demanded performance prerequisites; Gender differences; Physical
conditions; Attitudes deriving from family or groups; Group dynamic
processes
A review of the contextual factors relevant to THERP reveals that most can be
allocated to the RIFs identified by the first three ATM-related techniques. Several other
factors, such as decision-making, short-term, and long-tem memory (external PSF)
may be categorised as personal factors. These factors may become increasingly
important within the planned modernisation of ATM (i.e. datalink, electronic strips, or
stripless environment). Finally, the suddenness of occurrence factor identified in
THERP is not possible to categorise within existing RIF groups. This factor is relevant
for the occurrence of equipment failure in ATC environment as it greatly affects the
controller detection. Hence it should be treated as an additional potential RIF.
7.2.1.7 Human Error Assessment and Reduction Technique (HEART)
The HEART technique was developed by Jeremy Williams, a British ergonomist, in
1985. The review of this technique is available in EUROCONTROL (2004d) and
189
Chapter 7
Williams (1986). It is one of the most popular human error quantification techniques
due to its ease of implementation and is still used extensively in the nuclear, chemical,
petrochemical, railway, and defence industries.
HEART was derived from a wide range of findings in ergonomics literature. The
technique defines a set of generic error probabilities for the tasks considered, and
identifies the Error Producing Conditions (EPC) associated with these. EPCs include
particular ergonomic, task (e.g. inactivity, repetitious, or low mental workload tasks,
additional team members necessary to perform task normally), and environmental
factors that could each have a negative effect on human performance. In other words,
the definition of contextual factors or EPCs emphasises purely their negative impact on
human performance. The extent to which each EPC factor affects performance is
quantified and the human error probability is calculated as a function of the precise
effect of each EPC on a particular task. HEART assumes that basic human reliability is
dependent upon a generic nature of the task to be performed and that under nominal
conditions this level of reliability will tend to be consistent (Williams, 1986).
This technique identified 38 different Error Producing Conditions (EPC). These can be
categorised into two groups, those directly transferable to ATC and those that are not.
The EPCs relevant to ATC can be further sub-divided into those that fit within existing
RIF categories and those that do not. The former are, for example, unfamiliarity with a
situation which is potentially important but which only occurs infrequently or which is
new, a shortage of time available for error detection and correction, and a channel
capacity overload. The EPC concerned with unfamiliarity with a situation may be
captured through two RIFs i.e. training and experience. Unusual or emergency
situations (such as ATC equipment failures) are rare but highly demanding events that
require efficient and effective response from each controller. Regular and
comprehensive training plays a key factor in building the skills and experience
necessary to cope with such unusual situations. Shortage of time available has
already been discussed and recommended to be included as a candidate RIF (see
section 7.2.1.5). Finally, channel capacity overload is a term used for the workload
caused by simultaneous presentation of critical information to the human operator. As
such it can be classified under personal factors.
The EPCs not relevant to ATC include several factors. For example, a category
mismatch between the educational level and the requirements of the task is not
applicable to controllers. The level of education and training for ATC licence is
190
Chapter 7
standardised and reflects the knowledge controllers should acquire. Furthermore, the
category an incentive to use more dangerous procedures is also not applicable to
ATC as dangerous procedures or working practices are direct violations of the rules.
7.2.1.8 The Contextual Control Model (COCOM)
The COCOM model, developed by Hollnagel (1993), describes how human
performance is dynamically determined by the current context, as an alternative to the
common information processing models. This is a generic HRA approach not related to
any specific industry.
COCOM represents a control model of cognition focusing on two important aspects:
the conditions under which a person changes from one mode to another and the
characteristics of human performance in a given mode. COCOM recognises four
control modes: scrambled, opportunistic, tactical, and strategic. According to this
approach human actions are determined by the context as well as specific
characteristics and mechanisms of human cognition. In Hollnagels view, humans do
not passively react to events, they actively look for information and act based on
intentions as well as external developments. Therefore, it was concluded that human
actions are only meaningful when considered in the appropriate context.
In this regard, COCOM defines Common Performance Modes (CPM) as the conditions
under which the human performance takes place. Hollnagel (1993) divides them into
CPMs that may increase or decrease human reliability. The former include sufficient
available time, available plans, adequate Man Machine Interface (MMI) and support,
few simultaneous goals, normal/familiar process state, and adequate organisation. The
CPMs that may reduce reliability include insufficient available time, plans not available,
inadequate MMI and support, many simultaneous goals, abnormal process state, and
inadequate organisation.
According to Hollnagel (1993), the objective is not to find a precise probability of a
specific action but rather to identify the specific steps, which are particularly prone to
produce hazardous consequences. This knowledge can then be used to change the
design of the system, to introduce specific measures of compensation, and to construct
defences and recovery options. Generally, the objective of the recovery performance
assessment should be to identify the context that is likely to result in an inadequate
recovery performance. The characteristics of the context resulting in an inadequate
recovery performance would be used to define the necessary changes to the ATC
191
Chapter 7
system/component design (e.g. technical defences, recovery procedures and training).

This should allow the whole ATC system to be safer and more reliable.
The COCOM technique was subsequently used in the development of another method
discussed in the next section. Therefore the final choice of potential RIF factors from
both techniques is discussed within the next section.
7.2.1.9 Cognitive Reliability and Error Analysis Method (CREAM)
The CREAM methodology represents a further development to the COCOM model that
deals with the duality of competence and control in human cognition (Hollnagel, 1998).
Basing the work on COCOMs model of cognition and four distinctive control modes,
CREAM represents a practical approach for both human performance analysis (i.e.
retrospective analysis) and performance prediction. The method is cyclical rather than
sequential and has well-defined conditions that identify when an analysis should end.
Similar to COCOM, CREAM represents a generic approach not related to any specific
industry.
Using past research (i.e. THERP technique), Hollnagel (1998) attempts a more
structured approach where related categories of contextual factors are grouped
together. As a result he defines a small set of Common Performance Conditions (CPCs)
that contain the general determinants of performance (i.e. common modes) including:
adequacy of organisation, working conditions, adequacy of MMI and operational
support, availability of procedures/plans, number of simultaneous goals, available time,
time of day (circadian rhythm), adequacy of training and experience, and crew
collaboration quality. The proposed CPCs were intended to have a minimal degree of
overlap, although they are not independent.
Hollangel (1998) argues that there is a significant similarity between PSFs and CPCs.
However, the difference lies in the scope of these factors. Similar to CPMs in the
previous COCOM technique, CPC categories are more generic conditions and
designed to be applied in the early stage of the analysis to characterise the context for
the entire human operational task. On the other hand, PSFs tend to be more specific
and focused on a particular stage of that task.
Hollnagel (1998) went one-step further to define the levels that each CPC can take and
their appropriate effects on performance reliability (the so called typical values of
CPCs). These levels are based on general human factors knowledge and experience
192
Chapter 7
from the HRA discipline. Hollnagel used the general principle that advantageous
performance conditions improve reliability, whereas disadvantageous conditions are
likely to reduce it. If reliability is improved, operators are expected to fail less often in
their tasks and perform better in general. He proposed an expected effect of each CPC
on performance reliability at three levels: improved, not significant, or reduced. The
advantages of this approach can be seen in the direct link between the descriptors
used for CPCs and expected effect on human performance reliability. As such, the
research presented in this thesis adopted this approach (further explained in section
7.3).
In order to determine the overall effect of the context on human performance, the
CREAM technique assumes an expert judgement of the relevance of each CPC for the
particular event under investigation and its impact on the probability of failure (no
impact, improves, reduces). The resulting score is used to determine the expected
control mode, which, as previously mentioned, is: scrambled, opportunistic, tactical, or
strategic control.
Taking account of the review of both the CPMs (COCOM) and CPCs (CREAM), the
majority of the factors identified are directly transferable to ATC. The exceptions are
the number of simultaneous goals and normal/familiar process state (see Appendix VII).
Regarding the number of simultaneous goals, it is important to highlight that air traffic
control implies the simultaneous processing of multiple tasks. In other words, a
controller may be in radio contact with 10-20 aircraft simultaneously performing
computer-related tasks (e.g. entering assigned altitude information, handing off flights
to another controller). Therefore, high levels of multitasking remain inherent
characteristics of ATC (Wickens, 1992) and as such will be excluded from the list of
RIFs. The other factor (normal/familiar process state) is highly relevant to the recovery
performance but has to be indirectly mapped with training and experience.
7.2.1.10 Human Reliability Management System (HRMS)
The HRMS technique was developed to derive a comprehensive and accurate
assessment of human contribution to risk in the nuclear industry, through a detailed
task and error analysis, quantification, and practical error reduction scheme. Since this
technique was too resource-intensive, it was necessary to additionally develop a fast
screening technique. This light version required a detailed approach only for those
scenarios, which showed critical human involvement. This led to a subsequent
technique, the Justification of Human Error Data Information (JHEDI). Six PSFs were
193
Chapter 7
identified based on the assessment of several HRA techniques (Kirwan, 1997): time,
quality
of
information
and
interface,
training/expertise/experience/competence,
procedures, task organisation, and task complexity. Context is defined as complete

task design, the working and organisational environment, and the entire history of the
task and individual(s) performing the task. In fact context encompasses all the
conventionally-used PSFs, plus the myriad of factors, including culture, many too
microscopic and idiosyncratic, or even possibly too macroscopic and intangible to allow
a tractable predictive analysis (Kirwan, 1997).
The HRMS approach is based on its own audit document and consists of fifty questions
as an assumed limit for an acceptable and practicable tool. The expert inputs to each
of these questions (yes, no, not applicable) are used to rate each PSF, ranging from
zero to ten, where a value of zero represents a near-perfect design and ten a poor
design. As a result, a profile of PSFs is created for each task and further linked to the
known value of human error probability for that task (extracted from the available
incident database). The quantitative assessment of each new task comprises of its
comparison with known tasks (and their PSF profile) and deriving an extrapolation rule
to predict its outcome.
Looking at the PSFs identified in HRMS and JHEDI above, it is clear that time is an
important factor also relevant to controller recovery. The time it takes to recover from
the occurrence of an equipment failure is important in ATC due to its highly dynamic
nature and the potential for development of an unsafe situation (e.g. loss of standard
separation distance between aircraft). The other factors (e.g. quality of interface,
training, procedures) are also relevant to ATC and are already discussed for their
inclusion as potential RIFs.
7.2.1.11 A Technique for Human Event Analysis (ATHEANA)
The US Nuclear Regulatory Commission supported the development of ATHEANA as
a technique to overcome the shortcomings of the first generation HRA techniques
(Nuclear Regulatory Commission, 1998). ATHEANA is a context driven technique in
the identification and analysis of human failure events. This technique was intended to
provide a means for analysing Errors Of Commission (EOC). ATHEANA moved away
from random human errors under nominal conditions to errors which result from errorforcing contexts. According to ATHEANA, an error-forcing context comprises of two
components (i.e. plant conditions and associated PSFs) and is associated with (human)
unsafe actions. Thus, the emphasis is placed on the negative impact of context on
194
Chapter 7
human performance (similar to HEART technique). ATHEANA borrows its methodology

from HEART (see section 7.2.1.7) but accounts for various plant conditions into the
analysis. Starting from the basic scenario (i.e. nominal plant mode), various alternative
deviation scenarios were developed. The deviation scenarios include additional events
that increase the likelihood of certain error-mechanisms to be triggered (Nuclear
Regulatory Commission, 1998).
As in most other HRA methods, the PSFs derived for ATHEANA are broad categories
which need to be assessed for adequacy by the HRA analyst. These are: procedures,
training, communications, supervision, staffing, human-system interface, organisational
factors, stress, and environmental conditions. All these factors are relevant to controller
recovery from equipment failures in ATC and have already been discussed in the
previous sections.
7.2.1.12 Connectionism Assessment of Human Reliability (CAHR)
The CAHR technique was developed as part of a PhD dissertation and a project for the
German nuclear industry (Straeter, 2000). The objective of this dissertation was to
develop a method for evaluation of human reliability within plant events. The novelty in
this approach is that it is based on very detailed databases introduced to facilitate
international exchange of experiences on events in the nuclear industry. These
databases are: the Nuclear Computerise Library for Assessing Reactor Reliability
(NUCLARR), the Incident Reporting System (IRS), and the German special
occurrences database (BEVOR). These databases collect mandatory occurrences data
to enable international exchange of experiences on events in nuclear systems (Straeter,
2000).
The CAHR technique is based on the evaluation of the operators task from the incident
description and identification of interactions between various PSFs. In general, PSFs
are defined here as causes or conditions necessary for the occurrence of an error.
Straeter (2000) considered a weighting scheme for each PSF. Since the available data
sources (i.e. databases) offered a high-level event description, it was possible to move
away from a judgment based categorisation of PSFs towards a more analytical method.
Straeter (2000) determined the frequencies with which a shaping factor was observed
in connection to a human error of a certain type. However, as much as this approach
seems reasonable, it requires access to highly detailed datasets of human reliability
performance. Amongst the investigated events, Straeter (2000) determined 30
conditions under which human errors occurred. These were categorised into six groups:
195
Chapter 7
task (e.g. preparation, simplicity/complexity, precision, time pressure);

order issue (clarity of procedures, design of procedure, content, completeness,
presence);
person (e.g. processing, information, goal reduction);
activity (e.g. usability of control, usability of equipment, monotony, positioning,
quality assurance, equivocation of equipment);
feedback (e.g. arrangement of equipment, display range, accuracy of display,
labelling, marking, reliability); and
system (e.g. technical layout, external event, construction, redundancy, coupled
equipment).
The identified PSFs are applicable to recovery from equipment failures in ATC and
have been already considered for the inclusion in candidate RIFs (e.g. task, order issue
- procedures, person, activity operational support, feedback - HMI). The last CAHR
category (i.e. system) is also relevant as a potential RIF especially as it is deals with
technical layout or system architecture and level of redundancy (as a type of built-in
technical defence). However, these factors are important from a technical point of view
since they directly determine the reliability and availability of the ATC service. The
research presented in this thesis focuses on controller recovery performance once all
redundant systems fail and affect the controllers ability to control traffic in dedicated
airspace. As a result, more emphasis should be placed on built-in defences
transmitting information to the controller regarding the failure (e.g. alarms, alerts) since
these have an effect on the quality of the controller recovery process (for details see
Chapter 4, section 4.3.2). This also directly corresponds to findings by Kanse and van
der Schaaf (2000) reviewed in section 7.2.1.4.
7.2.1.13 Nuclear Action Reliability Assessment (NARA)
The Nuclear Industry Management Committee (IMC) and British Energy supported an
initiative to produce an enhanced and updated version of the HEART technique
specific to the nuclear industry and known as Nuclear Action Reliability Assessment NARA (Kirwan et al., 1994). A review of the data sources used for the original version
of HEART pointed out the need for a detailed human error probability database
(CORE-DATA) which overcame some of the shortcomings detected in the intervening
years. NARA is based on a combination of CORE-DATA and real accident/incident
data available from the nuclear industry, augmented by expert judgement.
196
Chapter 7
In this technique, contextual factors are referred to as Error Producing Conditions

(EPCs). However, the set of EPCs included in NARA was based simply on a review of
the data sources used in the original version of HEART. From the original thirty eight
PSFs identified in HEART, eighteen were included in NARA based on the findings from
the research by Kennedy et al. (2000). The factors relevant to controller recovery are
the same as those in the HEART model.
7.2.1.14 Human Performance DataBase (HPDB)
Park et al. (2004) emphasised the need to collect plant-specific or domain-specific data
in order to identify the key factors that can degrade/enhance a plants safety. To fulfil
this requirement they initiated the Human Performance DataBase (HPDB) under the
Korean Atomic Energy Research Institute. The objective of this database was to
provide the reliable human performance information needed to perform HRA,
especially for plant-specific emergencies. In order to achieve this objective, they
collected operational emergency reports from regular training sessions. Information
that was considered relevant for an appropriate HRA analysis was grouped under the
following categories:
available procedure;
description of the different tasks, steps, and actions, and their dependence;
demand of perception, cognition, and action to perform necessary tasks and
actions;
person or team issues;
level of experience; and
time needed to correctly perform tasks, steps, and actions.
The third category demand of perception, cognition, and action to perform necessary
tasks and actions refers to the operators workload. This factor has been assumed
under the personal factors similar to the approach taken in section 7.2.1.5. All other
factors have already been assessed as relevant to the recovery from equipment
failures in ATC.
Similar to the main objective of HPDB, the research presented in this thesis is relevant
to the advancement of knowledge of controller performance under emergency/unusual
situations, such as equipment failure in ATC. Under equipment failure occurrence
controller behaviour tends to differ from the normal everyday routine behaviour. For this
reason, it is necessary to review relevant internal or external factors that influence the
controllers recovery from unexpected equipment failures in ATC.
197
Chapter 7
The discussions presented in the previous sections attempted to extract relevant

factors from various human reliability research to assure the complete presentation of
the recovery context under research in this thesis. The following section gives a
summary of the findings.
7.2.1.15 Summary of the findings
The Recovery Influencing Factors (RIFs) relevant to ATC equipment failure have been
selected on the basis of several sources of information. In general, the definitions of
contextual factors throughout the assessed HRA techniques show great similarity,
where contextual factors are seen as causes, conditions, or factors that influence
human performance. The only difference is observed in three techniques (HEART,
ATHEANA, and CAHR) which focus purely on negative human performance.
The process follows to select the relevant RIFs started with an initial selection based
on the review of contextual factors identified in three ATC/ATM related human reliability
techniques, namely HERA, TRACEr, and RAFT (Table 7-3). As a result, nine groups of
RIFs have been determined as relevant to ATC: communication, traffic and airspace,
weather, procedures, training and experience, HMI, personal factors, organisational
factors, and task complexity. These initial findings are augmented with a review of nonATM related HRA techniques (as presented in the previous sections). Therefore, the
second step involved a review of eleven HRA techniques mostly designed to analyse
human error in the nuclear and process industries. These generated additional three
factors of relevance to controller recovery (see Table 7-3).
Table 7-3 Review of Human Reliability Assessment (HRA) techniques and relevant findings
Terminology
Definition of
HRA
used for
Extracted contextual
Industry
contextual
technique
contextual
factors
factors
factors
Communication for
recovery
Traffic and airspace
Corresponds to
Weather
Contextual
HERA
ATM
the definition is
Procedures
Conditions (CCc)
this research
Training
HMI
Personal factors
Organisational factors
Performance
No definition is
TRACEr
ATM
Shaping Factors
as above
provided
(PSFs)
No definition is
RAFT
ATM
as above
Task complexity
provided
198
Chapter 7
Recovery
from
failures
Chemical
Recovery
Influencing
factors (RIFs)
Corresponds to
the definition is
this research
Occurrence-related
factors (available and
applicable defences such
as alarm)
Group of factors relevant
for prioritisation of
recovery-related factors
(time available/time
pressure)
COREDATA
Nuclear
chemical
offshore
Performance
Shaping Factors
(PSFs)
Corresponds to
the definition is
this research
as above
THERP
Nuclear
Performance
Shaping Factors
(PSFs)
Corresponds to
the definition is
this research
Suddenness of
occurrence (or time
course of failure
development)
HEART
Nuclear
chemical
petrochemical
railway
defence
Error Producing
Conditions
(EPCs)
Corresponds to
the definition is
this research
as above
COCOM
Generic
More generic
definition
as above
CREAM
Generic
More generic
definition
as above
HRMS
Nuclear
ATHEANA
Nuclear
CAHR
Nuclear
NARA
Nuclear
HPDB
Nuclear
Common
Performance
Modes (CPMs)
Common
Performance
Conditions
(CPCs)
Performance
Shaping Factors
(PSFs)
Performance
Shaping Factors
(PSFs)
Performance
Shaping Factors
(PSFs)
Error Producing
Conditions
(EPCs)
Factors
Additionally
include myriad
of other factors
Emphasis is
placed on purely
negative context
Emphasis is
placed on purely
negative context
Corresponds to
the definition is
this research
No definition is
provided
as above
as above
as above
as above
as above
The assessed HRA techniques and their related factors are presented in tabular form
in Appendix VII. Factors from all techniques are compared to HERA, as the most
recent HRA technique in the ATC/ATM domain. In most cases, the comparison was
straightforward since certain factors were identified in almost all techniques. (e.g. the
factor procedures). However, a number of factors could not be identified as belonging
to any of the HERA categories and were thus categorised separately (shown as
dashed boxes in Appendix VII). Although these did not specifically fit any of the HERA
categories, they were retained because of their relevance to the recovery from
equipment failures in ATC. Table 7-3 gives an overview of the RIFs that are taken
forward for further analysis in the next section.
199
Chapter 7
7.2.2 Augmentation with equipment-failure related factors

Once the relevant factors have been determined based on the relevant HRA
techniques (Table 7-3), it was necessary to complement the identified RIFs with
equipment failure related factors. The reason for this is to better reflect the context
surrounding the occurrence of equipment failure and its subsequent controller recovery.
Chapter 4 yielded a further set of recovery factors related to some of the key
characteristics of equipment failures: ATC functionality affected (this is taken into
account separately through the classification of ATC functionalities as defined in
Chapter 2), complexity of failure type, time course of failure development, duration of
failure, impact on operations room (i.e. number of workstations/sectors affected), and
impact on ATC/ATM. As a result, the following RIFs have been added to the previous
list: complexity of failure type, time course of failure development, duration of failure,
and impact on operations room (i.e. number of workstations/sectors affected).
The relevance of the additional equipment-related RIFs has been confirmed in the
analysis of more than 20,000 operational failure reports from four different countries (as
presented in the Chapters 3 and 4). However, even the two brief operational reports
given in section 7.1.1 confirmed the relevance of the equipment-related RIFs, namely
number of workstations affected, time course of failure development, and complexity of
failure type.
7.2.3 Augmentation with dynamic situational factors

It was observed that the chosen RIFs represented more static aspects of the working
environment. As observed by Straeter (2005) dynamic situational factors play an
important role in human decision making and behaviour in emergencies (e.g.
unexpected equipment failure). Straeter (2005) identified a total of seven dynamic
situational factors subdivided into time-related and system-related. Time-related
dynamic situational factors are suddenness of onset of a system development,
operational phase of a task, and involvement of the operator. System-related dynamic
situational factors are: experience with system performance (reliance), conflicting
issues in the situation (task complexity), ambiguity of information in the working
environment, and misleading information processing (priming).
Based on the overview of these seven dynamic situational factors, it was possible to
identify additional three factors relevant to the recovery from equipment failures in ATC.
These are: experience with system performance (reliance), ambiguity of information in
200
Chapter 7
the working environment, and adequacy of alarm/alert onset (adapted suddenness of

onset of a system development factor). The remaining dynamic situational factors were
either already incorporated amongst candidate RIFs (i.e. task complexity) or were not
considered relevant in the ATM industry (e.g. operational phase of a task and
misleading information processing are more relevant for the non-ATM industries).
7.2.4 Further subdivision of the identified RIFs

In certain cases, the identified recovery factors were too generic to capture the specific
characteristics of the environment at the moment of failure. In order to avoid any
ambiguity, two principles are adopted at this stage of the research. Firstly, each
identified contextual factor is rephrased to better reflect the research presented in this
thesis. For example, communication is rephrased to communication for recovery
within team/ATC Centre. In this way, the selected RIF precisely reflects which segment
of communication is taken into account (i.e. in relation to the recovery process) and
between which parties (i.e. team of controllers or entire ATC Centre). The second
principle represents the subdivision of identified contextual factors whenever necessary
(see Table 7-4). As an example, the traffic and airspace factor is too generic to
capture the characteristics of both traffic and airspace and was therefore broken down
into two separate categories. A similar approach is applied to training and experience.
Table 7-4 Recovery Influencing Factors
Identified contextual factors
Corresponding Recovery Influencing Factors (RIFs)
Communication
Communication for recovery within team/ATC Centre
Traffic complexity during the recovery process

Airspace characteristics during the recovery process
Weather
Procedures
Training and experience
HMI
Personal factors
Task complexity
Time available & time pressure
Available and applicable defences and
barriers & alarms
Complexity of failure
Suddenness of occurrence & Time
course of failure development
Duration of failure type
Impact on operational room (i.e.
number of workstations/sectors
affected)
Experience with system performance
(reliance)
Weather conditions during the recovery process

Existence of recovery procedure
Training for recovery from ATC equipment failures
Experience with equipment failures
Adequacy of HMI and operational support
Personal factors
Adequacy of organisation
Conflicting issues in the situation (task complexity)
Time necessary to recover
Adequacy of alarms/alerts (as part of HMI)
Duration of failure
Number of workstations/sectors affected
Experience with system performance (reliance or trust
in the system)
201
Chapter 7
Ambiguity of information in the working

environment
Adequacy of alarm/alert onset
Ambiguity of information in the working environment

Adequacy of alarm onset
7.3 Definition of qualitative descriptors

The final step involves the definition of the qualitative descriptors for each RIF. In this
research, a qualitative descriptor defines the levels of impact that each RIF has in the
context of controller recovery performance. The simplest case would be a dichotomous
descriptor distinguishing only two levels of impact of each recovery factor. However,
this approach is often lacking valuable information and it is not always suitable.
Therefore, qualitative descriptors have been constructed providing three levels of
impact. It starts from Level 1, referring to the most desirable level (in terms of ATC
recovery), toward Level 2, referring to the tolerable or average level, and finishing with
Level 3, referring to the least desirable level. For example, the RIF communication for
recovery within team/ATC Centre would have three qualitative descriptors, namely
efficient communication, tolerable communication, and inefficient communication.
This approach is similar to that taken in the CREAM technique (Hollnagel, 1998;
section 7.2.1.9).
On the other hand, the RIF Experience with the system performance (reliance or trust
in the system) would have two qualitative descriptors. The first would be objective
attitude toward the system. The second would account for inadequate attitude of the
controller toward the ATC system and would include both positive experience with the
system (overtrust) and negative experience with the system (undertrust). In order to
accurately present the levels of impact that this particular RIF has in the context of
controller recovery performance, it was necessary to combine the cases of undertrust
and overtrust in the ATC system. To all extents and purposes, they both have a similar,
undesirable, affect on controller recovery performance. Undertrust in ATC systems
leads to inefficient use of available equipment or all of the available tools. On the other
hand, overtrust leads to complete reliance on the information provided by the system
without consideration of the controllers own judgement or situational awareness of the
position (lateral and longitudinal) and intent of the traffic within a dedicated airspace.
The above analyses led to a final set of 20 controller Recovery Influencing Factors
(RIFs) divided into four main groups: internal factors (i.e. factors related to the
controller), equipment failure related factors, external factors (i.e. factors related to
working conditions), and airspace related factors. Finally, it has to be noted that the
202
Chapter 7
definition of these 20 RIFs assumes that an equipment failure has occurred (i.e.
probability of equipment failure is 1). Otherwise, these 20 RIFs would have to be renamed and re-defined to allow an analysis of the context surrounding a particular event
under investigation, no longer being an equipment failure. Table 7-5 presents the final
set of factors relevant to the recovery from equipment failures in ATC, together with
their corresponding qualitative descriptors. It has to be noted that these 20 RIFs
represents high-level categories (e.g. personal factors) consisting of several low-level
factors (e.g. age, experience, stress, fatigue). The detailed definitions of these 20 RIFs
in this thesis are presented in Appendix VIII.
or factors
related to
working
condition
Equipment failure related factors
Internal factors
Table 7-5 Relevant recovery influencing factors and their corresponding qualitative descriptors
RIF name
Qualitative descriptor
Level
Suitable to the situation in question
1
Training for recovery from ATC
Tolerable to the situation in question
2
equipment failure
Counter productive to the situation in
3
question
Experienced a particular type of failure or
1
Experience with equipment
any other type of ATC equipment failure
failures
No experience with ATC equipment failures
2
Objective attitude toward the system
2
Experience with the system
Positive
experience
with
the
system
or
performance (reliance)
3
negative experience with the system
Suitable for the recovery process
1
Personal factors
Tolerable for the recovery process
2
Counter productive for the recovery process
3
Efficient
1
Communication for recovery
Tolerable
2
within team/ATC Centre
Inefficient
3
Single system affected
2
Multiple systems affected
3
Sudden failure
1
Time course of failure
Persistent or latent failure
2
development
Gradual degradation of system
3
One workstation/one sector or all
2
Number of workstations/sectors
workstations in one sector
affected
Several workstations/couple of sectors or all
3
workstations/all sectors
Adequate
1
Inadequate
3
1
2
Inappropriate
3
Short period of time
2
Duration of failure
Moderate or substantial period of time
3
1
Adequacy of HMI and operational
2
support
3
203
Chapter 7
Ambiguity of information in the

working environment
Adequacy of alarms/alerts
Airspace related factors
Traffic complexity during the
recovery process
Airspace characteristics during
the recovery process
Weather conditions during the
recovery process
Conflicting issues in the situation
(task complexity)
question
External working environment matches the
controller's internal mental model
External working environment mismatches
the controller's internal mental model
question
Information from the external world enters
the processing loop at the right time
Information from the external world enters
the processing loop at the wrong time
(misleading sequence of alarms)
Efficient
Tolerable
Inefficient
Average traffic complexity
High or low traffic complexity
Adequate
Tolerable
Inappropriate
Improved
Deteriorated
Average complexity of the situation
Conflicting, multiple tasks or extremely low
complexity of the situation
1
3
1
2
3
1
3
1
2
3
2
3
1
2
3
2
3
2
3
In order to assure a complete list of relevant contextual factors, a key step at this stage
included verification of the selected RIFs. An initial verification was provided by two
ATM specialists (from one European ATC Centre) with extensive operational
experience. They had an opportunity to review the candidate RIFs, their definitions,
and related qualitative descriptors (for evidence see Appendix II) and their feedback
was valuable in the approval of selected RIFs. Further verification of the selected RIFs
has been conducted in the experiment (presented in Chapters 9 and 10). A discussion
on the process to quantify the probabilistic definition of 20 RIFs, their interactions, and
their influence on controller recovery is presented in more detail in the following
Chapter.
7.4 Summary
This Chapter has had the objective of defining recovery context via a set of contextual
factors, known as Recovery Influencing Factors or RIFs. The Chapter has built on the
review of existing HRA techniques and their corresponding contextual factors to identify
which factors are relevant to recovery from equipment failure in ATC. This initial
selection of relevant contextual factors has been augmented with specific equipment
204
Chapter 7
failure related factors and dynamic situational factors. The methodology resulted in a
set of 20 controller RIFs. The Chapter concludes with a definition of the qualitative
descriptors for each RIF or the levels of impact that each RIF has in the context of
controller recovery performance. All results obtained have been initially verified by two
ATM specialists who reviewed the choice of selected RIFs and their qualitative
descriptors. The selection of relevant contextual factors (i.e. RIFs) and their qualitative
descriptors are taken forward to the next Chapter to develop the methodology for the
quantitative assessment of the recovery context.
205
Chapter 8
Quantitative Assessment of Recovery Context
Quantitative Assessment of Air Traffic Controller

Recovery Context
The previous Chapter presented a selection of contextual factors relevant to recovery

from equipment failures in Air Traffic Control (ATC), known as Recovery Influencing
Factors (RIFs). This selection was based on a review of existing Human Reliability
Assessment (HRA) techniques, augmented by specific equipment failure and dynamic
situational factors. A set of 20 RIFs were identified and distributed in four main groups:
internal, equipment failure related, external, and airspace related factors. In order to
facilitate quantitative assessment of the recovery context, the selected RIFs were firstly
assigned potential qualitative levels of impact followed by their quantitative definition
(i.e. probability of each level occurring). The Chapter starts by reviewing relevant past
research to formulate the methodology adopted in this thesis. The proposed
methodology consists of six steps. The qualitative definition of 20 RIFs from the
previous Chapter (Step 1) is followed by the quantitative definition of each RIF (Step 2).
This quantitative definition is based on various sources, such as past literature,
operational failure reports, expert input of eight ATM specialists, and the questionnaire
survey. The Chapter continues by the implementation of all existing interactions
between relevant RIFs (Step 3). These are identified by utilising operational experience
and further validated by past research and expert input. Incorporation of interactions
results in the change of RIF levels that necessitate determination of the cut-off point
between any two consecutive levels (Step 4). Finally, the methodology defines the
relationship between a particular RIF level and its effect on controller recovery
performance (Step 5), to conclude with the definition of a numerical indicator for each
recovery context (Step 6).
8.1 Lessons leant from past research

The review of various HRA techniques (in Chapter 7) identified two issues relevant to
this thesis. Firstly, it identified potential RIFs. Secondly, it revealed the two HRA
techniques which use contextual factors as the basis for quantitative human
performance analysis. These are: the Cognitive Reliability and Error Analysis Method -
206
Chapter 8
CREAM (Hollnagel, 1998) and Connectionism Assessment of Human Reliability CAHR (Straeter, 2000). A discussion of the CREAM techniques and its relevance to
this thesis is presented in sections 7.2.1.9 and 7.3 of Chapter 7 and will not be
repeated here. However, since the CREAM technique has been further developed in
the work by Kim, Seong, and Hollnagel (2005) and Fujita and Hollnagel (2004), both
approaches have been assessed for their relevance to the research presented in this
thesis.
8.1.1 Applications of the CREAM technique

The application of the CREAM technique by Kim, Seong, and Hollnagel (2005)
attempted a probabilistic determination of contextual factors to determine the relevant
control mode (tactical, opportunistic, scrambled, and strategic control as defined in
CREAM). In short, the authors proposed probability distributions for nine contextual
factors or CPCs, taking into account their dependencies. The advantage of their
approach is the straightforward incorporation of uncertainties. In other words, this
approach is useful in the case of contextual factors which are not clearly defined or
understood. Because of this particular feature, this approach has been adopted in this
thesis.
Furthermore, Kim, Seong, and Hollnagel (2005) link each level of a contextual factor to
a specific type of control and assess all possible contexts using the Bayesian Belief
Network (BBN) approach. Littlewood, Strigini, Wright, and Courtois (1998) state that
the use of BBNs allows safety experts to better handle safety assessment and
potentially make hidden safety arguments more visible, communicable, and auditable.
In general, the concept of BBN is based on a probabilistic approach. It combines expert
input and data, and is useful for building complex and uncertain applications. However,
the approach by Kim, et al. (2005) based on nine CPCs was too complex.
Subsequently, Kim, et al. simplified it by grouping the nine CPCs into the groups of
three, further assessed by the BBN approach. For this reason, a probabilistic approach
based upon C programming codes and the core methodology by Kim et al. (2005) is
used in this thesis to enable incorporation of all 20 RIFs.
The application of the CREAM technique by Fujita and Hollnagel (2004) is designed as
a practical application of CREAM for screening various scenarios and estimating the
failure probability solely from the characteristics of the contextual conditions
surrounding an occurrence (e.g. accident). In this way, the method moves away from
the notion of human error and focuses more on context as a driving force of inadequate
207
Chapter 8
human performance, regardless of whether an individual or a team is involved.

Although it demonstrates the usefulness of the CREAM methodology, this method is
not very relevant to this thesis.
8.1.2 Connectionism Assessment of Human Reliability (CAHR)

As previously discussed in section 7.2.1.12 of Chapter 7, CAHR is a data-driven HRA
technique based on highly detailed databases of incident reports in the nuclear industry.
Using the available incident reports, it was possible to move away from an expert
judgment based categorisation of PSFs towards a more analytical method. However,
ATC still lacks a high-level database that captures human performance in the event of
an ATC related incident/accident. Therefore, an analysis of context as performed in
CAHR is still not achievable in the ATC industry. Some initial attempts to establish a
database that captures the human performance data are planned by EUROCONTROL
through the Human Error in ATM (HERA) project (EUROCONTROL, 2002d), but
currently this is incapable of supporting any meaningful statistical analysis.
The following Table 8-1 summarises the characteristics of CREAM, its two main
applications, and CAHR. Section 8.2 builds on the relevant elements of the CREAM
technique to define a framework for the quantitative assessment of recovery context.
Table 8-1 Overview of CREAM and CAHR differences

HRA technique
CREAM by
Hollnagel (1998)
Improvement of
CREAM by
Fujita and
Hollnagel (2004)
Improvement of
CREAM by Kim,
Seong, and
Hollnagel (2005)
CAHR by
Straeter (2000)
Relevant area
Theoretical
approach toward
human erroneous
action
Theoretical
approach toward
action failure rate
based on contextual
factors
Theoretical
approach toward
human erroneous
action
Data driven
approach defined
within nuclear
industry
Number of
contextual
factors
Interaction
between
contextual factors
Output
Nine
Included
qualitatively
Quantitative
probabilistic
range
Ten
Included
qualitatively
(based on
CREAM)
Quantitative
mean failure rate
Nine
Included
qualitatively
(based on
CREAM)
Quantitative,
probabilistic
approach
Included
quantitatively using
the available data
Connectionism
method
facilitating
qualitative and
quantitative
approach
Thirty
208
Chapter 8
8.2 Framework of the methodology

assessment of recovery context
for
quantitative
The proposed methodology is generic as its aim is to present the framework for a
generic ATC Centre, as described in Chapter 2, section 2.4. Used operationally, this
methodology would have to be refined to reflect and incorporate all the characteristics
of the ATC Centre or event under investigation.
In general this methodology consists of six steps (Figure 8-1). Firstly, it is necessary to
review the twenty RIFs identified in the previous Chapter and their relevance to the
ATC Centre or event under investigation. In the generic approach, all 20 factors are
assessed and defined through their qualitative descriptor or their levels of impact on
controller recovery performance (Step 1). Secondly, based on available sources of
information each RIF is probabilistically defined (Step 2). As a result, it is possible to
present the recovery context as a function of identified RIFs and their corresponding
levels. At this stage, there is no consideration of the interactions between RIFs, as they
are considered to be independent. To provide an accurate approach, Step 3 takes into
account all interactions between RIFs. These are assessed both qualitatively and
quantitatively. This results in a distribution of RIFs levels. Having a distribution of RIF
levels, as opposed to discrete Levels 1, 2 and 3, necessitates identification of the cutoff point between any two consecutive levels (Step 4). Once these cut-off points are
identified and RIF levels re-defined, the next step quantifies the relationship between
the particular level of RIF and its impact on controller recovery performance. This
relationship is expressed via correlation coefficients (Step 5). At this stage, previously
determined probabilities of each RIF level (Step 2) are re-calculated to account for
RIFs interactions. The result is the definition of an aggregated indicator of the recovery
context, referred to as the recovery context indicator Ic (Step 6).
The Figure 8-1 below presents the six steps framework of the quantitative assessment
of the recovery context. Since the previous Chapter identified and discussed all 20
RIFs and their levels of impact (qualitative descriptor), the following section discusses
the consequent step, namely probabilistic assessment of RIFs (Step 2). This is
followed by the remaining steps of the proposed methodology (Figure 8-1).
209
Chapter 8
Figure 8-1 Framework for the quantitative assessment of the recovery context
210
Chapter 8
8.3 Probabilistic assessment of RIFs (Step 2)

Given that the aim of this Chapter is to present a reliable quantitative approach for the
analysis of the controller recovery performance, it is necessary to probabilistically
define levels of influence of each RIF on controller performance (referred to as
qualitative descriptor). As previously discussed in Chapter 7 (section 7.3), the
qualitative and quantitative definition of RIFs assumes that a failure occurred (i.e. that
the probability of failure is 1). In this way, it is possible to define every possible context
as a combination of RIFs and their corresponding levels of influence, i.e. qualitative
descriptor. This approach is important for the prospective analysis of controller
performance, as well as a retrospective event analysis. Even in the case of
retrospective analysis, specifying RIFs exactly is not straightforward due to the lack of
data and information about the context. In the case of predicting future events or
potential hazardous contexts, specifying the RIFs accurately becomes much more
difficult and a level of uncertainty is inherent in the process.
The use of a probabilistic approach has several advantages. Firstly, if a certain RIF is
not clearly specified or known, it is possible to assume probabilities for each of its
levels based on operational data. In this way any uncertainties identified for a certain
RIF can be considered more explicitly as illustrated by Kim, Seong, and Hollnagel
(2005). Another advantage of this approach is that the probability distribution of the
context, and indirectly controller performance, is a result of considering all possible
combinations of contextual factors or RIFs.
The definition of each RIF in terms of the probability of each of its levels is not
straightforward. However, this is necessary for any attempt to quantify the
effectiveness of controller recovery performance in a given context or environment.
Major difficulties are experienced in the quantification of internal RIFs (or factors
related to the controller), as it is hard to quantify any type of human performance. It is
also difficult to quantity some of the equipment failure related RIFs due to the lack of
consistent data collection in the available occurrence reporting schemes. In other
words, some failure characteristics, such as the number of workstations affected, are
not consistently reported. Finally, the majority of the external RIFs are highly ATC
Centre specific and as such extremely hard to define in a generic form. Bearing this in
mind, it is understandable why the quantification of RIFs has been a challenge in the
past.
211
Chapter 8
For this reason, it should be noted that this Chapter captures the characteristics of the
generic ATC Centre as a base for any further fine tuning of the proposed methodology
and its usage as either a retrospective or prospective/predictive tool. Each ATC Centre
has its unique characteristics that may be represented by different RIF probabilities.
For example, the number of workstations/sectors affected and complexity of failure
type depend on a particular architecture in each ATC Centre, while training for
recovery as well as adequacy of organisation depend on a particular safety culture.
The framework developed in this Chapter is applied to a unique ATC Centre, presented
in Chapter 10.
8.3.1 Sources of information

A total of four different sources of information have been consulted in order to
determine the necessary RIFs probabilities. These are: operational failure reports
(presented in Chapter 4), the responses from the questionnaire survey (presented in
Chapter 6), responses of ATM specialists, and past literature. Table 8-2 presents the
number of RIFs defined by each available source of information, while the following
paragraphs explain each source in detail. However, two RIFs are not informed by any
of the available sources (number of workstations/sectors affected and adequacy of
alarm/alert onset). In these cases, a conservative approach is taken and probabilities
are equally assigned between their levels. Details are presented in Appendix VIII.
Furthermore, three RIFs are informed by combined sources of information (last column
in Table 8-2).
Table 8-2 Distribution of probabilistic RIF ratings per source

Source of probabilistic
assessment
Operational failure reports
Questionnaire survey
Averaged ATM specialists
input
Past literature
No available source
Sum
Number of RIFs assessed

directly (single source)
3
Number of RIFs assessed

indirectly (combined sources)
1 (RIF11) 1 (RIF6)
-
12
1 (RIF11) 1 (RIF3) 1 (RIF6)
2
17
1 (RIF3)
3 (i.e. RIF3, RIF6, and RIF11)
8.3.1.1 Operational failure reports

The probabilistic assessment of the recovery factors is informed by the analysis of
more than 20,000 operational failure reports on equipment failures originating from
three Civil Aviation Authorities (referred to as Countries A, B, and C) and one ATC
212
Chapter 8
Centre system control and monitoring database (referred to as Country D). Detailed
analyses of these reports are presented in Chapter 4.
The analyses of operational failure reports are used to inform two particular RIF
probabilities. The first one is complexity of failure type. The probabilities relevant to
this RIF are determined by tracking the number of reports based on only single failure
compared to those reporting more than one failure. These findings are further validated
by the responses from the eight ATM specialists surveyed. The second RIF is duration
of failure. This RIF is informed by the analysis of data from Country D database, as it
was the only database that captured duration of failure. These findings are further
validated by the responses from the eight ATM specialists surveyed.
8.3.1.2 Questionnaire survey
The responses from the questionnaire survey, received from 34 different countries,
captured the experiences of more than one hundred air traffic controllers (average
controller experience is 13.8 years, ranging from 1 to 39 years). The detailed
assessment of this dataset is presented in Chapter 6. This source provided an input for
three RIF probabilities. These are: training for recovery from ATC equipment failure,
previous experience with a particular type of equipment failure, and existence of
recovery procedure.
The first RIF (training for recovery from ATC equipment failure) is more difficult to
determine compared to other two RIFs. The questionnaire survey determined that 51.7
percent of sampled ATC Centres have established training for recovery (informed
probability of RIF1 defined via Level 1) and that 31 percent have not (informed
probability of RIF1 defined via Level 3). The remaining 17.4 percent of sampled ATC
Centres showed inconsistent responses and this result is translated into the probability
of this RIF1 defined via Level 2 or tolerable level. It is assumed that inconsistent
responses on the existence of recovery training, within the same ATC Centre, may
suggest that training is not organised in a consistent manner.
8.3.1.3 Input by ATM specialists
Several probabilities are captured through the input from relevant ATM specialists from
eight similar ATC Centres. The ATM specialists from Ireland, Norway, Sweden, Austria,
New Zealand, Australia, and Japan participated in the small-scale survey. In two cases
the relevant probabilities are captured through face-to-face interviews (with ATM
specialists from Ireland and Norway), whilst in all other cases a predefined set of
213
Chapter 8
questions was distributed for self-completion. These questions were designed to

investigate the factors that impact on controller recovery (as defined via 20 RIFs). For
example, their input informed the probabilities which could not be captured using other
sources of information either because of their confidential nature (e.g. time course of
failure development) or because of the general unavailability of data (adequacy of HMI
and operational support, adequacy of organisation). The form used with both face-toface interviews and self-completion methods of response collection is available in
Appendix IX.
The ATM specialists surveyed have wide ATM operational experience and worked as
either rated air traffic controllers or as engineers in the operational ATM environment.
However, their resident ATC Centres needed to be assessed to establish the level of
similarity that may be reflected in their RIF ratings (Table 8-3). All eight ATC Centres
provide Area Control Service (ACC) while some also provide oceanic air traffic services,
i.e. control of traffic transiting oceanic areas where the absence of radar coverage
necessitates the use of procedural control. Furthermore, six ATC Centres are equipped
with advanced ATC systems, utilising the latest automated tools such as Short Term
Conflict Alert (STCA), Area Proximity Warning (APW), and Minimum Safe Altitude
Warning (MSAW). Finally, although the traffic is reported at the country level, all ATC
Centres provide the majority of ACC services in their respective countries. For this
reason, country-level traffic figures can be taken as a good indicator of the amount of
traffic controlled by each respective ATC Centre. Reviewing the available traffic figures,
only Japan differs significantly compared to other countries. The Tokyo area represents
one of the busiest airspaces in the world, comparable to the London and Maastricht
areas of Europe.
Table 8-3 ATM specialists involved in the assessment of RIFs

Resident ATC
ATC Service
Total IFR flights controlled within
ATC system status1
Centre
provided
the country in 2005 (in thousands)
2
Shannon
ACC/Oceanic
Latest generation
621
2
Oslo
ACC
Latest generation
488
2
Malmo
ACC
Latest generation
686
Vienna
ACC
Older generation
8192
Auckland
ACC/Oceanic
Latest generation
5553
Melbourne
ACC/Oceanic
Latest generation
6474
Source: personal correspondence with Dr Arnab Majumdar who visited all listed ATC Centres
Source: EUROCONTROL Performance Review Report (EUROCONTROL, 2006c)
3
Source: Airways New Zealand (2006b)
4
Source: Bureau of Transport and Regional Economics (2006). Australian Government
2
214
Chapter 8
Christchurch
Tokyo
ACC/Oceanic
ACC/Oceanic
Latest generation
Older generation
555
2,2505
The responses from the ATM specialists surveyed are used to inform 12 RIFs. For
three RIFs their responses have been used to either supplement the findings from the
past research (for the experience with the system performance RIF) or validate
findings from the operational failure reports (for the complexity of failure type and
duration of failure RIFs).
For majority of RIFs, the responses from the ATM specialists surveyed have been
consistent. However, for six RIFs some ATM specialist gave different answers. This
was the case with the following RIFs: personal factors, communication for recovery
within team/ATC Centre, time course of failure development, adequacy of HMI and
operational support, airspace characteristics, and conflicting issues in the situation
(task complexity). For example, for personal factors the majority of ATM specialists
reported this RIF as suitable for the recovery process in 70 to 90 percent of failure
occurrences. However, Oslo and Tokyo ATM specialists reported personal factors as
suitable in less then 15 percent of failure occurrences. These lesser ratings of the
personal factors indicate the perception of ATM specialists on readiness of air traffic
controllers to face unusual/emergency situations, such as equipment failure.
Similarly, potential gaps are identified with Melbourne and Christchurch ATC Centres
where the majority of failures seem to be latent (accounted for 92 and 60 percent,
respectively). This is contrary to the answers provided from other ATC Centres. Finally,
the potential gaps regarding the adequacy of airspace are identified by ATM
specialists from Auckland and Tokyo ATC Centres. They ranked airspace design and
configuration as tolerable, highlighting the potential for improvement of airspace
characteristics to enhance controller recovery performance.
It can be concluded that the ATM specialists from eight countries worldwide produced
similar ratings for the majority of RIFs. Identified inconsistencies reflect differences that
exist between these ATC Centres in terms of the ATC Centre culture (reflected in
personal factors), airspace design, and ATC Centre architecture. These differences are
reasonable as indicators of diversity that exists between ATC Centres within one
Source: Air Traffic Activity at Area Control Centre (last available for 2003) from Ministry of
Land, Infrastructure, and Transport (2006)
215
Chapter 8
country as well as worldwide. As a result, the responses from the ATM specialists
surveyed have been taken to inform several RIFs. In future, the weighting scheme may
be used to account for the variability between ATC Centres (e.g. safety culture,
differences of ATC Centres, ATM specialists experience).
8.3.1.4 Past literature
Finally, the relevant data from past ATC research are used to inform probabilities for
the RIF experience with the system performance. The probabilities are determined
from the findings of Hilburn and Flynn (2001) and EUROCONTROL (2000b) in which
18 percent of controllers reported undertrust in technology. These findings are
combined by the responses from the ATM specialists surveyed on the percentage of
controllers with an excessive trust in technology (i.e. overtrust). Therefore, both
sources of information are used to establish the final probability rating for this particular
RIF (presented in Appendix VIII).
8.3.1.5 Aggregation of data
The previous sections have described four different sources of information used to
determine RIF probabilities. These are: operational failure reports, responses from a
questionnaire survey, responses from the ATM specialists surveyed, and past literature.
Table 8-4 reviews all four sources of information with respect to the level of confidence
and therefore the rationale behind the aggregation of data. Three data sources are
rated with a high level of confidence (questionnaire survey, responses from the ATM
specialists surveyed, and past literature). Only one source is rated with medium
confidence. More precisely, the confidence level for operational failure reports from the
CAA databases is not defined as high due to the lack of information on the reliability of
available reporting schemes. There are reliability issues regarding the reporting of
safety occurrences recognised by CAAs 6 . However, none of the CAAs has a
methodology in place to assess the reliability of their reporting scheme, and therefore,
the completeness of the occurrence databases. Therefore, the medium ranking for the
confidence level is an assumption informed by operational experience. As a result, the
data from this source are validated by the findings from another source of data (i.e.
ATM specialists input) to assure reliable RIF ratings.
International workshop on the analysis of aviation incident/accident precursors. The workshop

was held on 25 and 26 May 2005 at Imperial College London.
216
Chapter 8
Table 8-4 Overview of the sources of information used to determine RIF probabilities
Source
Level of confidence
(subjective)
Operational failure
reports from the CAAs
Medium
Operational failure
reports from the
engineering unit of
particular ANSP
High
Questionnaire survey
High
ATM specialists
High
Past literature
High
Comment
The confidence level is not defined as high
due to the lack of information on reliability of
available reporting schemes
The confidence level is defined as high due
to the fact that the engineering unit has to be
aware of all equipment failures occurring in
the ATC Centre as they are directly
responsible for their maintenance and repair
Responses from 134 air traffic controllers,
from 58 ATC Centres, and 34 countries
worldwide
Conducted with ATC specialists from eight
ATC Centres worldwide
Hilburn and Flynn (2001) and
EUROCONTROL (2000b)
In general, the above analyses employed the data from all four sources to define the
probabilities for 20 Recovery Influencing Factors (RIFs). These are presented in
Appendix VIII.
8.3.2 Summary
The preceding paragraphs have used the qualitative levels of the impact of each of the
RIFs (i.e. qualitative descriptor) defined in Chapter 7 and probabilistically defined each.
Overview of all 20 RIFs, their corresponding levels, and designated probabilities is
provided in detail in Appendix VIII and in a tabular form in Appendix X.
Having defined all 20 relevant recovery factors in the previous sections, it is possible to
define recovery context. In general the recovery context may be seen as a discrete
function since all possible contexts are defined exactly by 20 elements, and since each
RIF has only two or three defined levels. In mathematical terms, the existing method
can be expressed as a function f using a set of 20 RIFs to define the recovery context
indicator (Ic) as shown in equation 8-1:
Ic = f (RIF1, RIF2 ,...., RIF20 )
8-1
The total number of possible recovery contexts represents the number of combinations
of the 20 RIFs, where nine of them have three levels whilst eleven have only two levels
of impact. In total, this approach generates 39 x 211 = 40,310,784 possible contexts,
each having equal probability of occurrence of 1/40,310,784 = 2.4E-08. In
mathematical terms this is equivalent to finding all variation with repetitions of 20 RIFs
217
Chapter 8
and their corresponding levels. In addition, each recovery context will have a specific
value of the recovery context indicator (Ic). The methodology to calculate this variable
is presented in the remainder of this Chapter.
Table 8-5 presents an example of a potential recovery context as a 20-digit array
where each digit corresponds by its position to a particular RIF and by its value to the
precise impact of a particular RIF on controller performance. At this stage, all RIFs are
considered independently and their corresponding levels of influence on controller
performance take integer value, i.e. 1, 2, or 3.
Table 8-5 Example of a potential recovery context represented as a 20-digit array

RIF ID
RIF1
RIF2
RIF3
RIF4
RIF5
RIF6
RIF7
RIF8
RIF9
Level
1
1
2
1
1
2
1
2
1
RIF ID
RIF11 RIF12 RIF13 RIF14 RIF15 RIF16 RIF17 RIF18 RIF19
Level
2
2
1
1
3
3
3
1
3
RIF10
1
RIF20
3
The following sections show how the existing RIFs interactions may change the RIF
levels in either direction (i.e. increase the value of the level which corresponds to the
deterioration in controller performance or decrease the value of the level which
corresponds to an improvement in controller performance).
8.4 Interactions between Recovery Influencing Factors (Step 3)

The methodology for the assessment of the recovery context surrounding the
equipment failure occurrence presented in this Chapter is based upon 20 relevant
contextual factors or RIFs. In order to provide an accurate approach, this methodology
has to take into account all the interactions between these contextual factors. The
interactions have been initially established based upon operational experience and
validated by findings from HRA techniques and ATM specialists. The selection of all
relevant RIFs and establishment of their interactions creates a basis for the generation
of all possible recovery contexts and the calculation of the numerical indicator for each
context (Ic). The steps taken to identify RIFs interactions are presented in the following
sections.
8.4.1 Identification of RIF interactions

At first glance, the identified RIFs reveal possible interactions between them. For
example, a poorly designed display (i.e. HMI) as well as inadequate knowledge of ATC
system modes (i.e. inadequate training) may lead to delayed failure detection and less
efficient recovery. Furthermore, stress as a personal factor cannot be independent of
218
Chapter 8
traffic and airspace complexity. If a controller deals with increased levels of traffic, it is
reasonable to assume that stress levels will be higher.
In order to determine the effect of contextual factors on controller performance it is
therefore necessary to describe these interactions, in addition to describing how they
affect controller performance. The analysis of interactions makes it possible to gain a
more accurate picture of the context and thus a better understanding of the recovery
process. In other words, this permits a broader retrospective analysis as well as a more
precise prediction of the effectiveness of the improvement measures. As noted by
Straeter (2000), such interactions could also point to additional factors previously
omitted, such as potential organisational shortcomings.
Straeter (2000) tackles this problem in CAHR by looking at the common appearance of
different factors (using available databases). The analysis is based on capturing the
observed interactions between reported contextual factors. The availability of a detailed
database is however a prerequisite to this approach. Hollnagel (1998) on the other
hand establishes these interactions in CREAM by considering each contextual
condition with respect to how it generally influences the others (there is no mention
whether expert judgement or operational expertise have been used). It is also
important to say that CREAM assumes reciprocal interaction between the contextual
conditions.
The interactions amongst predefined 20 RIFs have been determined based on known
relationships from operational experience and marked with symbol in Table 8-6.
They represent the irreversible influence between two RIFs or how RIFs in the first row
affect RIFs in the left hand column. The reason for irreversible influence lies in the
characteristics of the air traffic environment where one factor may influence the other
one without any reverse effect. For example, complex traffic can influence controller
personal capabilities in terms of increased stress, anxiety, and workload; while the
opposite influence (impact of personal capabilities on traffic complexity in the sector) is
simply not logical.
219
Chapter 8
3
4
6
7
8
10
11
Personal factors
(a)
(a)
(a)
(c/
a)
(c/
a)
(c/
a)
Comm. for
recovery within a
team of
controllers
Complexity of
failure type
Time course of
failure develop.
Number of
workstations/
sectors affected
Time necessary
to recover
20
15
onset
19
14
18
13
Ambiguity of info in the working
environment
Airspace characteristics
12
Adequacy of HMI
17
11
Duration of failure
16
10
Existence of recovery
procedure
9
(h/
a)
(h/
a)
(h/
a)
(h/
a)
(a)
(h/
a)
(h)
(h/
a)
(h/
a)
(h/
a)
(h/
a)
(h/
a)
(c/
a)
(a)
(x)
(a)
(h/
a)
(h/
a)
(h/
a)
(h/
a)
(c/
h/
a)
Traffic
6
5
Comm. for recovery within a
team of controllers

development
Number of workstations/
sectors affected
(a)
Task complexity
Training for
recovery from
ATC equipment
failures
Previous
experience with
equip. failures
Experience with
system perf.
(reliance)
Weather conditions
Personal factors
Direct Influence
Training for recovery from ATC

equipment failures
Previous experience with
equip. failures
Experience with system
performance (reliance)
RIF
ID
Table 8-6 Interactions matrix: (c) validation by CREAM, (h) validation by CAHR, (a) validation by
ATM specialists; and (x) not validated interactions
(c/
a)
(a)
(a)
(a)
(a)
(a)
(x)
(h/
a)
(h)
(x)
(a)
(x)
(h/
a)
(h)
(x)
(h/
a)
(h/
a)
(h)
(h/
a)
(a)
(a)
(a)
(h/
a)
(h/
a)
(h/
a)
(c/
h/
a)
(c/
h/
a)
Existence of
recovery
procedure
Duration of
failure
(a)
(a)
(a)
(c/
a)
(a)
(a)
(c/
h/
a)
(c/
h/
a)
Adequacy of
HMI
13
Ambiguity of info
in the working
environment
14
Adequacy of
alarms/alerts
(a)
15
Adequacy of
alarms/alerts
onset
(a)
16
Adequacy of org.
17
Traffic
18
Airspace char.
19
Weather
20
Task complexity
(a)
(a)
(c/
h/
a)
(c/
a)
(a)
(a)
12
(c/
h)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(a)
(c/
a)
(c/
a)
(c/
a)
(c/
a)
(c/
a)
(a)
(a)
(h/
a)
(h/
a)
(h/
a)
(a)
(a)
(a)
(h/
a)
220
(c/
h/
a)
(a)
(c/
h/
a)
(c/
h/
a)
(c/
a)
(c/
h/
a)
(a)
(a)
(a)
(a)
(a)
(a)
(x)
(a)
(a)
Chapter 8
8.4.2 Validation of RIF interactions

This section validates the interactions identified in the previous section. This was
carried out in two stages. The first stage (sections 8.4.2.1 and 8.4.2.2) addresses
interactions identified in existing literature (CREAM and CAHR techniques). Although
Chapter 7 presented the basic principles behind these two techniques and extracted
candidate RIFs, this Chapter focuses only on the assessment of the interactions
between contextual factors identified in both techniques. The second stage (section
8.4.2.3) identifies the interactions based on the input by three ATM specialists. The
self-completion method was used to collect their responses.
8.4.2.1 CREAM
A comparison of the interactions between contextual factors defined in the CREAM
technique (i.e. CPCs) and those defined between RIFs (Table 8-6) shows a degree of
mapping. A direct link was found with all interactions except those relevant to working
conditions and number of simultaneous goals CPCs. As already explained in Chapter
7, these two contextual factors are excluded from the list of RIFs. Note that the
interactions relevant to the crew collaboration quality CPC are compared with those
related to the communication for recovery RIF, because mostly verbal form of
teamwork occurs after the detection of equipment failure.
The CREAM technique is developed as a generic technique for the analysis of human
actions. Therefore, it is not specifically ATC oriented and cannot entirely reflect the
characteristics of the ATC environment. For this reason, several RIFs could not be
mapped to the CPCs. These are personal factors (except time of the day as one of the
contextual factors identified in CREAM), complexity of failure type, time course of
failure development, number of workstations/sectors affected, duration of failure, traffic
complexity, airspace characteristics, and weather conditions. In general from all the
interactions identified amongst the RIFs, 22 percent have been reflected in CREAM.
Mapping between CREAM CPCs factor interactions and RIF interactions is presented
with symbol c in Table 8-6.
8.4.2.2 CAHR
A comparison of the interactions between six Man-Machine System (MMS) and their
corresponding PSFs defined in CAHR and those defined between RIFs (Table 8-6)
shows a degree of mapping. This mapping is presented in Table 8-7.
221
Chapter 8
Table 8-7 Mapping between RIFs and CAHR contextual factors

RIF
Personal factors
Number of workstations
affected
Duration of failure
development
Existence of recovery
procedure
Adequacy of HMI
Airspace-related factors
MMS
Person
Task
System
Task
Task
System
Order-issue
Feedback
Task/activity
Several identified PSFs are relevant to the nuclear plants (e.g. task preparation,
precision, labelling, marking), whilst the majority are applicable to recovery from
equipment failures in ATC (e.g. time pressure, procedures, HMI). Straeter
(2000)
presents reciprocal interactions between PSFs in CAHR as captured through the

analysis of the common appearance of different factors in individual events from
nuclear databases. Table 8-6 presents these interactions (marked with h in Table 8-6).
35 percent of the RIF interactions are captured by CAHR.
8.4.2.3 Validation by ATM specialists
Various interactions between failure characteristics, airspace, traffic, personal factors,
ambiguity of information in the working environment, and the time necessary to recover
have not been confirmed through the preceding validation processes. However, the
existence of links between these factors has been validated independently by three
ATM specialists.
These ATM specialists come from the same ATC Centre and have more than ten years
of operational experience in the ATC domain. ATM specialists reviewed existing
interactions and marked those with which they disagreed. Their input was taken
through a small-scale self-completion survey based on the interactions identified in
Table 8-6 and marked with . The exact form used in this small-scale survey is
presented in Appendix XI. The comparison of their independent validations showed
similarities. Several inconsistencies were identified, mostly due to ATM specialists
initially reading the matrix wrongly. These were clarified via personal correspondence
before the final validation. As a result, 90 percent of the RIF interactions from Table 8-6
have been validated by the ATM specialists (marked with a in Table 8-6).
222
Chapter 8
8.4.2.4 Validation summary

95 percent (107 interactions out of 113) of the RIFs interactions have been validated by
existing literature and ATM specialists. The remaining six interactions were not
validated by either of the sources available. These, marked with x in Table 8-6, are:
impact of number of workstations/sectors affected on personal factors;
impact of duration of failure on personal factors;
impact of number of workstations/sectors affected on communication for
recovery;
impact of duration of failure on communication for recovery;
impact of airspace characteristics on communication for recovery; and
impact of weather on airspace characteristics.
From the perspective of past research and ATM experts input these six interactions do
not exhibit any correlation and thus, the research presented in this thesis excludes
them from the remaining analysis. However, a more quantitative approach would be
required in future. For example, further development of the HERA database could allow
additional validation of RIF interactions (including these six). Furthermore, it could allow
the quantification of their level of influence through the definition of the coefficient of
interaction. Details on the coefficient of interaction are presented in the next section.
8.4.3 Quantification of RIFs interactions

The validated RIFs interactions above were used to develop a method to quantify the
level of interactions. The most accurate approach would be to analyse each interaction
separately as presented in equation 8-2:
RIFY j ' = RIFY j +
k
x
xy
R x =RIFY j + k xy
8-2
where,
RIFYj
represents a level j of RIFY; j =1, 2, or 3;
RIFYj
represents a level j of RIFY after incorporation of RIF interactions, 0.0 j 4.0;
kxy
represents the coefficient of interaction between RIFX and RIFY (kxykyx);
Rx
depends upon the level of RIFX Rx={+1, 0, -1}
In other words, kxy is the numerical representation of the direct influence that RIFX has
on RIFY. Note that the interaction factor represents irreversible interaction (i.e. kxy kyx).
Taking into account the overall lack of quantitative assessment of context in the area of
223
Chapter 8
ATC, it is difficult to determine each coefficient kxy separately. As already discussed in

section 8.1.2, some initial attempts to establish a detailed database that captures the
human performance data are planned by EUROCONTROL through the Human Error in
ATM (HERA) project (EUROCONTROL, 2002d). Although the interactions do not
necessarily have the same level of influence, this thesis had to define a more generic
approach to account for lack of operational data. Nevertheless, if the RIFs interactions
become quantifiable (e.g. via HERA database), the methodology presented in this
Chapter will still be valid.
As a result, this thesis follows the assumption that all determined interactions have the
same level of influence, referred to as k. Namely, it is assumed that interactions
between all pairs of RIFs are equal and as such that there is only one coefficient, k=1/
(N-1). N represents the total number of relevant RIFs for a particular ATC Centre or a
particular incident under investigation. In addition, (N-1) is used because one factor
cannot influence itself. Therefore, in the case of 20 relevant factors, the coefficient of
interaction would be calculated as k=1/19=0.053.
One important assumption made here is that all RIFs which influence a particular RIF
can never change its level by more than one unit, e.g. from Level 3 to Level 2 but not
from Level 3 to Level 1. The reason for this is that it takes more than 50 percent of
relevant RIFs to influence one particular RIFs in exactly the same manner in order to
change its level (either enhancing or worsening it). For example, in the generic
approach where all 20 RIFs are relevant, it will take at least 11 RIFs, all defined via
Level 1, to influence one particular RIF in order to enhance its level by one unit, either
from Level 3 to Level 2 or from Level 2 to Level 1. This concept is similar to the
approach presented in CREAM (Hollnagel, 1998).
As a consequence of incorporating RIF interactions, the RIF levels change. Table 8-8
presents the change in the RIF levels from the initial integer values (i.e. 1, 2, or 3)
presented in Table 8-5. If the level of any RIF decreases as a number this means that
other RIFs impacted this particular RIF in such a way that the change enhances
controller performance (see RIF20 in Tables 8-5 and 8-8 which decreased from the
initial value of 3 to a new value of 2.74). Similarly, if the RIF level increases as a
number means that other RIFs impacted this particular RIF in such a way that the
change degrades controller performance (see RIF18 which increased from the initial
value of 1 to a new value of 1.11). It is important to note that the probability of the
224
Chapter 8
occurrence of any context, with or without incorporation of RIF interactions, is the same
(1/40,310,784=2.4E-08 as previously reported in section 8.3.2).
Table 8-8 Recovery context (as presented in Table 8-5) after the incorporation of RIF
interactions
RIF ID
Level
RIF ID
Level
RIF1
1.00
RIF11
1.95
RIF2
.95
RIF12
2.00
RIF3
1.95
RIF13
0.89
RIF4
.84
RIF14
1.05
RIF5
.89
RIF15
2.95
RIF6
2.05
RIF16
2.89
RIF7
1.05
RIF17
2.95
RIF8
2.05
RIF18
1.11
RIF9
.74
RIF19
3.00
RIF10
1.05
RIF20
2.74
In short, a change (increase or decrease) in the value of a particular RIF represents the
final outcome of all possible interactions with that particular RIF. For example, RIF5
level changes from value 1 to value 0.89 as a results of the influence of 15 different
RIFs, as seen from the matrix in Table 8-6 (see row 5).
In this particular example, RIF1, RIF2, RIF4, RIF9, RIF10, RIF13, and RIF14 influence
RIF5 in a positive way as they are defined via Level 1. As a result, each of these seven
RIFs decreases the RIF5 level by -1/19=-0.053. However, RIF15, RIF16, RIF17, RIF19,
and RIF20 influence RIF5 in a negative way as they are defined via Level 3. As a result,
each of these five RIFs increases the RIF5 level by +0.053. Other RIFs, namely RIF3,
RIF6, and RIF12 do not have any influence on RIF5 as their level is 2, which assumes
no significant influence on human performance. Furthermore, RIF7, RIF8, RIF11, and
RIF18 have no impact on RIF5 and therefore are not considerate. The result of this is
an overall decrease in RIF5 level as follows (equation 8-3):
RIF 5 j ' = RIF 5 j + 7 ( k ) + 5 k = 1 + 2 ( 0.053 ) = 1 0.106 = 0.894
8-3
The incorporation of all identified RIF interactions applied to all the identified recovery
contexts (all 40,310,784 of them) made it possible to identify the distribution of all RIFs.
Prior to incorporation of RIF interactions, the distribution of each level is the same. For
example, Figure 8-2 represents the distribution of RIF5 without incorporation of RIF
interactions. This graph represents three levels of RIF5 in a symmetrical manner, each
accounting for exactly 13,436,928 contexts or one third of the total (Figure 8-2). This
results in equal representation of each level in the 40,310,784 possible recovery
contexts.
225
Chapter 8
16000000
14000000
Frequency
12000000
10000000
8000000
6000000
4000000
2000000
3.
9
3.
6
3.
3
2.
7
2.
4
2.
1
1.
8
1.
5
1.
2
0.
9
0.
6
0.
3
Level
Figure 8-2 Distribution of RIF5 levels amongst identified recovery contexts without interactions
However, due to the identified interactions, the distribution of RIF5 levels amongst all
possible recovery contexts takes a different, more dispersed, shape (Figure 8-3). It is
notable that the more interactions exists with a particular RIF, the more dispersed the
distribution of levels will be. The example utilised in this section (i.e. RIF5) has a
substantial number of other contextual factors that affect it, namely 15. However, in
some cases the number of identified interactions can be small (e.g. one or two) while in
the case of RIF19 (weather conditions) there are no identified interactions and thus this
RIF has a similar distribution to RIF5 (Figure 8-2). In any case, the total number of
recovery contexts where RIF5 (or any other RIF) is defined via Level 1 remains the
same whether RIF interactions are incorporated or not. The distribution of the levels for
each of the 20 RIFs is presented in Appendix XII in a tabular format.
4000000
3500000
Frequency
3000000
2500000
2000000
1500000
1000000
500000
2.
3
2.
5
2.
7
2.
9
3.
1
3.
3
3.
5
3.
7
3.
9
0.
5
0.
7
0.
9
1.
1
1.
3
1.
5
1.
7
1.
9
2.
1
0.
1
0.
3
Level
Figure 8-3 Distribution of RIF5 levels amongst identified recovery contexts with interactions
Once the RIF interactions have been identified and their impact quantitatively
determined, the next step is to re-calculate existing RIF probabilities to more accurately
reflect newly determined RIF levels. However, to achieve this step it is necessary to
226
Chapter 8
determine the cut-off points between any two consecutive levels of influence, i.e. to
determine the precise boundaries between Level 1, Level 2, and Level 3. Another
option would be to consider each of the distributions separately, i.e. covering the entire
spectrum (-, +). In this way, there is no cut-off point and there is coherency between
all results as well. However, both approaches yield similar results as there is very little
overlap between these distributions. The following section explains the method applied
to determine the cut-off points between any two consecutive RIF levels.
8.5 Methodology for the determination of the cut-off points

(Step 4)
As a result of differences between the interactions affecting different RIFs (see Table 86) as previously highlighted, the cut-off points between different RIFs will vary from one
RIF to the other. The shape and dispersion of the distribution of levels for each RIF
depends upon the number and type of interactions with other RIFs. As an example,
observe the difference in the distribution of levels for RIF1 (Figure 8-4) and RIF20
(Figure 8-5), where RIF1 is impacted by two different RIFs while RIF20 is being
impacted by 17 different RIFs.
10000000
9000000
Frequency
8000000
7000000
6000000
5000000
4000000
3000000
2000000
1000000
3.
4
3.
7
3.
1
2.
8
2.
5
1.
9
2.
2
1.
6
1.
3
0.
7
0.
4
0.
1
Level
6000000
Frequency
5000000
4000000
3000000
2000000
1000000
3.
7
3.
4
3.
1
2.
8
2.
5
2.
2
1.
9
1.
6
1.
3
0.
7
0.
4
0.
1
Level
227
Chapter 8
The statistical method for determining the cut-off points between the levels for each
RIF is based on the 95 percent confidence interval for each level. For example, a 95
percent confidence interval for Level 1 of RIF1 would cover 95 percent of the normal
curve, where the probability of observing a value of Level 1 RIF1 outside of this area
would be less than 0.05. Under the assumption of a normal distribution7, the interval
range ( - 2, + 2) captures approximately 95 percent of data.
The advantage of this approach is that it takes a common statistical approach. In

addition, this method relies upon known values of and in order to define interval the
range for each level. In other words, to calculate the values of and for RIF1 Level 1,
it is necessary to already have an assumption about the sample size (depicted as N in
equation 8-4).
N
n =1
( X
Xn
n =1
)2
, where
8-4
represents population mean for RIF1 Level 1 (population of all possible recovery
contexts where RIF1 is defined through Level 1);
represent population standard deviation for RIF1 Level 1;
represents the total number of recovery contexts in which RIF1 is defined via Level 1;
Xn
represents the n-th value of the variable RIF1 Level 1 (n=1,2, . , 40,310,784).
To overcome this, three different interval values or three different cut-off points
(assumed based upon the initial distribution of data) are tested. For example, when
assessing the cut-off points between levels of RIF5, three different values between
Level 1 and Level 2 have been tested (namely Fit 1, Fit 2, and Fit 3 in Figure 8-6).
Corresponds to the symmetrical distribution of levels around the values of 1, 2 and 3, but also
to the large number of observations.
228
Chapter 8
Figure 8-6 Distribution fitting for the three cut-off points on the example of RIF5 Level 1
The normal distribution parameters, as presented in Table 8-9, show no difference

between the distribution of RIF 5 Level 1 data when first and second cut-off points are
applied. However, the use of third cut-off point determines a different distribution. This
is expected as the third cut-off incorporates data which shows increased frequency for
the value of 1.8 (see Figure 8-7 and Table 8-9). Based on this, Fit 1 and Fit 2,
corresponding to cut-off points 1.6 and 1.7 respectively, are taken forward. However, it
is necessary to determine which of these two values will be taken as a final cut-off point.
Table 8-9 Descriptive statistics for the three cut-off points on the example of RIF5 Level 1
Cut-off point
Standard
Standard error on
RIF5 Level 1
Mean
used
deviation
the mean
Fit 1
1.6
1.18
0.17
4.59E-05
Fit 2
1.7
1.18
0.17
4.65E-05
Fit 3
1.8
1.19
0.19
5.11E-05
In order to precisely determine the optimal cut-off point, it is necessary to apply a

polynomial function to the data between the mean values for Level 1 and Level 2 and
determine the minimum of that function. The polynomial function minimum rounded to
the first decimal should indicate the cut-off point (either 1.6 or 1.7). Table 8-10 presents
three different polynomial functions applied to distribution of RIF5 Level 1 and Level 2
Probability density function approach represents distributions so that the sum of the areas of
the rectangles equals 1.
229
Chapter 8
data. The calculation of the function minimum9 shows that regardless of the type of
polynomial function, the local minimum corresponds to the cut-off point at 1.7 (Table 810). The fit of a cubic polynomial function to RIF5 Level 1 data is presented in Figure 87. Since Table 8-9 shows that the choice of cut-off at 1.6 and 1.7 constitute no
significant difference, and since the function minimum is closer to the value of 1.7, this
value is taken forward as a cut-off point between RIF 5 Level 1 and Level 2.
Table 8-10 Local minimums of polynomial functions

Quadratic
Cubic
Quadric
Polynomial function f(x)

2
1E07(1.3472x - 4.5848x + 3.9200)
1E07(-0.5613x3 + 4.2097x2 - 9.3510x + 6.5076)
1E08(-0.1785 x4 1.1574 x3-2.6289x2 +2.4203 x -0.7121)
Local minimum
1.7016
1.6653
1.6756
4000000
f(x)= 1E07(-0.5613x 3 + 4.2097x 2 - 9.3510x + 6.5076)
3500000
3000000
Frequency
2500000
2000000
1500000
1000000
500000
0
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.1
2.2
Level
Figure 8-7 Cubic polynomial function f(x) fitted for the RIF5 data to determine its minimum
Similarly, the value of 2.7 is taken as a cut-off point between Level 2 and Level 3 (see
Table 8-11). Using the same methodology, the cut-off points are determined for all
RIFs and their corresponding levels. The established values are reported in Table 8-11.
Table 8-11 Cut-off points between the levels for all RIFs
Cut-off point between Level 1 and
RIF ID
Level 2
1
1.5
2
1.5
3
N/A
9
Cut-off point between Level 2 and

Level 3
2.5
N/A
2.5
In the case of quadric polynomial functions, it is necessary to specify the local minimum (this
polynomial function has three first derivatives and thus potentially two minimums).
230
Chapter 8
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1.7
1.7
N/A
1.5
N/A
2.7
2.7
2.5
2.5
2.5
2.2
1.5
N/A
1.5
2.5
2.5
2.5
2.0
1.5
2.5
2.0
1.5
N/A
1.6
N/A
N/A
2.5
2.5
2.6
2.5
2.7
8.6 Specific effects of RIFs on controller recovery performance

(Step 5)
While the previous section identified the cut-off points between consecutive levels of
each RIF, it is necessary to quantify the relationship between the particular level of a
RIF and its impact on controller recovery performance. This relationship has been
already defined qualitatively in Chapter 7 through the definition of the qualitative
descriptor. In short, Level 1 corresponds to the most desirable level, Level 2 to the
tolerable or average level, whilst Level 3 corresponds to the least desirable level in the
context of controller recovery performance.
In order to begin to look at the quantitative impact of each RIF level on the controller
recovery performance, the correlation coefficient is proposed. This correlation
coefficient is defined as: +1.00 corresponding to Level 1 (high positive relationship),
0.00 corresponding to Level 2 (no relationship), and -1.00 corresponding to Level 3
(high negative relationship). This approach is in line with the approach presented in
Oren, and Ghasem-Aghaee (2003) who also introduced a correlation coefficient as an
indicator of the relationship between the factors that define a personality (e.g.
openness, extroversion) and different personality types.
Once the relevant RIFs and their corresponding levels have been defined and linked to
the controller recovery performance, the next step is to present the recovery context as
a function of all contextual factors, their interactions, and impact of controller recovery
performance. The following section presents the definition of the recovery context via
recovery context indicator.
231
Chapter 8
8.7 Calculation of the recovery context indicator (Step 6)

Based on the determination of the boundaries between consecutive levels for each RIF,
it is possible to proceed with the re-calculation of RIF probabilities and the
determination of the numerical indicator of each recovery context (i.e. recovery context
indicator - Ic). These are presented in the following sections.
8.7.1 Re-calculation of RIF probabilities

The main task at this stage is to re-calculate the probabilities that correspond to more
realistic (effective) levels resulting from the incorporation of all RIF interactions. The
previous example of one randomly chosen recovery context showed that RIF5 changed
from Level 1 (Table 8-5) to a new effective level (0.89; Table 8-8). Therefore, if the
probability of RIF5 at Level 1 is 0.73 (see Table 8-12), then it is necessary to determine
the probability of the new, effective level 0.89.
Table 8-12 Probabilities for the RIF5 and each of its levels (see Appendix X)
RIF5: Communication for recovery within
team/ATC Centre
Efficient
Tolerable
Inefficient
Level
p(L)
1
2
3
0.73
0.24
0.04
The way to approach this problem is firstly to determine all recovery contexts for which
RIF5 is represented via Level 1. In other words, it is necessary to determine the
number of recovery contexts for which the RIF5 level is smaller or equal to the cut-off
point between Levels 1 and 2 (i.e. 1.7, Table 8-11). This is presented in equation 8-5
below:
RIFX1 =
RIFX
j'
j'
C j , j +1
C1,2
RIFX1 =
RIFX j ' =
RIFX j ' , 0 < j ' C j , j +1
1 ,
j ' =0
j ' =0
C j , j +1
C2,3
j = 2 , RIFX2 =
RIFX j ' =
RIFX j ' , C j 1, j < j ' C j , j +1
j ' =C j 1, j
j ' =C1,2
4
4
3 , RIFX3 =
RIFX j ' =
RIFX j ' , C j 1, j < j ' 4.0
j '=C j 1, j
j ' =C2,3
232
8-5
Chapter 8
where
X
represents different contextual factors, X= 1,2,3,20;
represents a level of RIFX and can take the values of 1, 2 or 3;
represents a level of RIFX after incorporation of interactions where 0.0 j4.0;
Cj j+1
represents a cut-off point between Levels j and j+1;
For example, for RIF5 (Table 8-11):

1 , C j , j +1 = C1,2 = 1.7, 0 < j ' 1.7
j = 2 , C j , j +1 = C 2,3 = 2.7, 1.7 < j ' 2.7

3 ,
N / A,
2 .7 < j ' < 4 .0
Secondly, it is necessary to determine a subset of recovery context which correspond

to the newly determined level (i.e. 0.89). These are all recovery contexts having RIF5
level in the range (0.8, 0.9]. It should be noted that level 0.89 represents the value of
RIF5 level for one specific recovery context. Finally, the probability of the new level is
calculated as follows (equation 8-6):
p(RIFX j ' ) = p(RIFX j )
f (RIFX j ' )
f (RIFX j )
f (RIF 5 0.89 )
1,,008,576
p(RIF 5 0.89 ) = 0.73
= 0.73
= 0.055
f (RIF 51 )
13,476,924
8-6
where
X
represents different contextual factors, X= 1,2,3,20;
represents levels 1, 2, or 3;
represents the sum of all possible recovery contexts;
p (RIF5 j)
represents initial probability of occurrence of RIF5 for level j;
p (RIF5 j)
represents probability of occurrence of RIF5 for its new level j;
f (RIF5 j)
represents the sum of levels for 0.89 < j 0.90; and
f (RIF5 j)
represents the sum of all levels that correspond to the RIF5 Level 1
(i.e. 0.0 < j 1.7).
The new probability of occurrence (0.055) is low in its magnitude, but represents an
occurrence which a high probability of recovery. In other words, in this particular
context, RIF5 is enhanced by the influence of all the other RIFs that have interaction
with it. The final output of this methodology is the indicator of a specific recovery
context (Ic), as presented in equation 8-7. The characteristics of Ic are that, for
example, in the case of all 20 RIFs defined via Level 1 with the probability 1 and no
233
Chapter 8
interactions, the value of Ic equals 1. Similarly, in the case of all 20 RIFs defined via
Level 3 with the probability 1 and no interactions, the value of Ic equals -1.
20
i =1
20
p(RIFX j ' ) R j
+
j =1
3levelsRIFs i =1
N
3
Ic =
p(RIFX
j' ) R j
j =1
2levelsRIFs
8-7
, where
p(RIFX j)
probability of RIFX with level j, where X=1, 2, 3, , 20 and 0.0 j 4.0. The
level j takes into account all interactions between RIFs;
Rj
correlation coefficient between RIFX and controller recovery performance.

Depending upon level j, it can take values {-1, 0, +1};
total number of recovery factors (i.e. 40,310,784); and
p(RIFX j) x Rj
probability of the overall situation occurring in one ATC Centre. In order to

look at the quantitative impact that each RIF has on the controller recovery
performance, each of the probabilities has to be multiplied with the correlation
coefficient.
All calculations relevant to the quantitative assessment of the recovery context

conducted in this thesis are performed using standard C programming language.
8.7.2 Distribution of the recovery context indicator

The recovery context indicator (Ic) represents the numerical representation of a specific
context that surrounds controller recovery from an ATC equipment failure. For
example, changes in the factors that constitute the recovery context (i.e. 20 RIFs),
captured via the change of their qualitative levels, interactions, and effect on controller
performance, are reflected in the change of the Ic magnitude. In practical terms, this
change facilitates better or worse controller recovery.
After the calculation of all 40,310,784 possible contexts it was determined that the
mean value of recovery context indicator (Ic) is 0.027, ranging between -0.069 and
0.131. The distribution of the Ic variable is presented in Figure 8-8.
234
Chapter 8
600000
Frequency
500000
400000
300000
200000
100000
-0
.0
-0 7
.0
5
-0 9
.0
4
-0 8
.0
3
-0 7
.0
2
-0 6
.0
1
-0 5
.0
04
0.
00
7
0.
01
8
0.
02
9
0.
04
0.
05
1
0.
06
2
0.
07
3
0.
08
4
0.
09
5
0.
10
6
0.
11
7
0.
12
8
Recovery context indicator (Ic)
Figure 8-8 Distribution of the recovery context indicator
This distribution is slightly positively skewed (right-skewed) since it has a longer tail in
the positive direction relative to the other tail. This is also confirmed by the positive
value of the statistical test indicating the concentration of values on the left side of the
distribution. The median value or value on the horizontal axis which has exactly 50
percent of the data on each side is -0.023. This positive skew may result from initial
inputs into the methodology for the quantitative (probabilistic) assessment of the
recovery context surrounding equipment failure in ATC. For example, observing the
probability values for each RIF and its corresponding levels it is clear that 12 out of 20
RIFs have a higher probability of enhancing recovery performance as opposed to
having no impact or negative impact. In other words, the probabilities of Level 1 for
these 12 RIFs are higher than for other level(s) (i.e. Level 2 and Level 3, see Appendix
X for details on RIFs probabilities). Therefore, it can be concluded that the framework
for a calculation of the recovery context in the generic ATC Centre takes the value of
the recovery context indicator close to 0.027. This indicates that there is a large
potential for improvement and shift of the Ic values more towards a positive side, thus
enabling more appropriate contextual conditions.
In order to fully comprehend the characteristics of Ic, the next step is to calculate the
extreme values of Ic, from the most negative towards the most positive value of Ic. In
other words, it is necessary to determine the ideal recovery context where all RIFs can
235
Chapter 8
be expressed via Level 110. Similarly, it is necessary to determine the worst possible
recovery context where all RIFs can be expressed via Level 311. In these cases, when
there is no uncertainty related to the probabilities of each RIFs level, it is possible to
represent the most negative and the most positive recovery context.
Hence, the most negative value of Ic calculated using equations 8-6 and 8-7 takes the
value of -0.95. This value represents the worse possible recovery context that can
facilitate controller recovery performance in the generic ATC Centre. Similarly, the
most positive value of Ic calculated using the same equations is 0.65. These two
values are numerical representations of two extreme recovery contexts which are
mutually exclusive. However, these extreme values may be used as a good indicator of
the scale of changes that are possible to achieve within the ATC environment.
8.7.3 Sensitivity analysis

Because of the large number of recovery contexts (millions) it is reasonable to use the
assumption of normality in accordance with the central limit theory (Berenson et al.,
2006). When the data set is large, the sampling distribution of the mean is
approximately normally distributed. Using this assumption, it is possible to carry out an
analysis of the sensitivity of Ic to changes in any one recovery influencing factor.
The first step is to determine an interval around the baseline (population) mean that
includes 95 percent of the sample means or 2. According to the statistics presented
in Table 8-13 this range is 0.027+/-0.058. The second step is to implement a particular
change and test whether the sampled recovery context indicator comes from the same
population. As an example, it is assumed that the training for the recovery provided to
air traffic controllers includes the equipment failure in question. Therefore, since there
are no uncertainties, this RIF can be defined exactly via Level 1 and its corresponding
probability (p=1). Sample statistics are presented in Table 8-13.
10
RIF3, RIF6, RIF8, RIF11, RIF17, RIF19, and RIF20 do not have the possibility of Level 1 and
thus these will take the next most desirable level, being Level 2.
11
RIF2 does not have the possibility of Level 3 and thus it will take the next most undesirable
level, being Level 2.
236
Chapter 8
Table 8-13 Sensitivity analysis

Step change
Baseline
N=40,310,784
Sample 1 (change of RIF1)
N=13,436,928
Sample 2 (change of RIF1 and RIF2)
N=6,718,464
Statistics (M, SD)

M=0.027
SD=0.029
M=0.061
SD=0.035
M=0.091
SD=0.023
Baseline mean range
(-0.031, 0.085)
With suitable training for the situation in question (e.g. a particular failure type) there is
no significant difference between the sample and baseline means but it is observable
that the value of Ic shifts toward a more positive value. Therefore, a second sample
was taken, assuming additionally that RIF2 or experience with equipment failure
matches precisely the equipment failure in question. In other words, RIF2 can be
defined exactly via Level 1 and its corresponding probability (p=1). The result of this
analysis shows that there is a significant change in the recovery context, since the
obtained mean does not fit the 95 percent confidence interval determined for the
baseline. Therefore, the enhanced recovery context (sample 2) comes from a
population different from the baseline recovery context. This finding indicates that the
value of Ic is sensitive to changes in the individual RIFs.
8.7.4 Optimal solutions

The methodology for the quantitative assessment of the recovery context presented in
the previous sections allows for the investigation of the recovery context in a particular
ATC Centre as well as for a particular equipment failure event. Furthermore, this
approach creates a basis for quantitative assessment and the choice of optimal
solutions for recovery enhancement. These solutions should be reviewed through the
changes in RIFs, their corresponding level, and the resulting changes in the value of Ic.
Whilst not all RIFs could be enhanced, it is necessary to focus on those which may be
affected. For instance, it is reasonable to assume that internal factors have a significant
potential for change either by enhancement of training or personal abilities on a daily
basis (e.g. fatigue, health, attitude, stress). A review of the other three RIF groups
(equipment related, external, and airspace related) reveals potential areas of change
as well as factors which cannot be influenced at the level of a particular ATC Centre
but possibly at the level of a region (e.g. traffic complexity is possible to impact on the
regional ATM level through the central flow management unit).
The optimal change is defined as the best ratio between the benefit and the cost of the
proposed recommendations. Benefit is defined as a shift in the RIF levels toward more
237
Chapter 8
desirable Level 2 (average) or Level 1 (most favourable) and an overall shift in the
recovery context indicator (Ic) towards more positive values (e.g. extreme positive
value). The cost should be defined through the inherent costs linked to the proposed
recommendation and therefore, should include actual rather than generic costs of the
proposed change within the specific ATC Centre. Thus the cost may include the
following:
costs of technical changes, followed by any other operational costs (delay in the
use of new system due to necessary maintenance, staff training);
costs of designing a new procedure, followed by the cost of training the staff (i.e.
time and resources);
cost of additional Team Resource Management (TRM) training;

creation of a more adequate organisational environment. The examples are
improvements in terms of roles and responsibilities, the availability of team
members, the adequacy of supervision, the availability of additional support (e.g.
assistant), the personnel selection process, shift patterns and personnel planning,
attitude to teamwork, safety culture, stress management programs, support for
the
organised
exchange
of
past
experience
on
non-nominal
events,
communication with management and technicians (e.g. briefings, exchange of

knowledge, bulletins, safety panels); and
the costs of any potential changes in airspace design.

The methodology presented in this thesis is able to provide the benefit of each
proposed solution. However, the evaluation of the related costs, as opposed to the
benefit, is not so straightforward and would necessitate input from ATC Centres.
Therefore, another approach may be utilised to rate the benefit of implemented
changes on the level of ATC Centre, namely by the calculation of the recovery context
efficiency. This variable represents the ratio between the value of current recovery
context and the value of the most positive recovery context feasible in a particular ATC
Centre.
8.8 Summary
This Chapter has presented a methodology for the quantitative assessment of recovery
context. It started by reviewing the past HRA research of relevance to the quantitative
analysis of contextual factors. This has resulted in the selection of the CREAM
technique and its application by Kim, Seong, and Hollnagel (2005) for further
development. Building on this, a novel methodology has been developed for the
research presented in this thesis. This method assessed controller recovery
238
Chapter 8
performance based on 20 relevant contextual factors (RIFs) and through several

distinct steps. Each RIF and its corresponding levels have been probabilistically
determined using four sources of information. These are operational failure reports,
questionnaire survey, input from eight ATM specialists, and past ATM related literature.
The methodology has further built on this and incorporated RIF interactions. This has
resulted in the change of the RIF levels and re-calculation of the corresponding
probabilities. The outcome of the entire methodology is the definition of the recovery
context indicator (Ic), as a numerical representation of a specific context surrounding
recovery from equipment failure in ATC. Ic is sensitive to the RIF changes and as such
may be used to investigate solutions to enhance the controller recovery. In other words,
the benefits of any safety-relevant changes in ATC Centres may be quantitatively
assessed in two separate ways. Firstly, the benefit can be assessed as a shift in the
distribution of the recovery context indicator from the baseline (pre-change) value to
the new value (as a result of implemented changes). Secondly, it is possible to
calculate the context utilisation or the ratio between the current value of the recovery
context and its most positive value achievable within the particular ATC Centre.
After the review of the methodology for the quantitative assessment of recovery context
in a specific ATC environment, the following Chapter 9 describes an experimental
investigation designed to further verify the proposed methodology.
239
Chapter 9
Experimental Investigation
Experimental Investigation of the Air Traffic Controller

Recovery Performance
After the review of the methodology for the quantitative assessment of the recovery
context in the previous Chapter, this Chapter describes an experiment designed to
further validate the proposed methodology and capture the controller recovery
performance. This Chapter begins with a high-level design for the process adopted for
the experiment. This is followed by the rationale behind the need for the experiment
defined through several objectives. In order to achieve these objectives, this Chapter
describes the overall design of the experiment and selection of potential equipment
failures initially tested in a pilot study. It continues by providing the key requirements for
the experiment of relevance to this thesis, measured variables, and experimental
procedure.
Both the pilot and the main experiment were conducted in close collaboration with one
European Civil Aviation Authority (CAA)1. This particular CAA provided all of the
necessary infrastructure and staff from two ATC Centres during the period of the
experiment in 2005 and 2006. One ATC Centre was used for the pilot study which
tested the feasibility of the experimental design and its overall methodology. The other
ATC Centre was used on three separate occasions to simulate a selected unexpected
equipment failure in order to capture data on the recovery performance of 30 licensed
air traffic controllers. The Chapter concludes with a discussion of measured variables
used to capture the characteristics of controller recovery in ATC. The data collected is
subjected to a rigorous analysis in Chapter 10.
This CAA performs the function of Air Navigational Service Provider (ANSP) and the term CAA
will be used to denote also ANSP in the remainder of this thesis.
240
Chapter 9
9.1 High-level design of the experimental process

Figure 9-1 below indicates the steps of organising and conducting this experiment. The
process starts with the rationale behind the need for experiment designed to capture
controller recovery performance. It proceeds with the assessment of available
resources, with focus on two key requirements, namely access to an ATC simulator
and the participation of controllers. Once these requirements have been assured, the
experimental process proceeded with the initial planning and design of the experiment
(i.e. airspace and traffic scenario, equipment failure type). Once this design had been
tested in a pilot study, the experimental process proceeded with the main experimental
study. Collected data are pre-processed and subjected to a rigorous analysis to extract
information of controller recovery from an operational environment (presented in
Chapter 10).
Rational for the
experiment
Assessment of
the available
resources
Planning for the

experiment
Design of the
experiment
In case of
necessary
changes
Selection of the
equipment failure
Pilot study
Revision of the
pilot study
Main
experimental
study
Data processing
and analysis
Figure 9-1 The flow diagram of the experimental process
241
Chapter 9
9.2 Rationale for the experiment

The preceding Chapters presented a detailed overview of equipment failure
occurrences in the ATC environment from both technical and human perspectives. The
findings from past literature were augmented by operational failure reports (capturing
the technical aspect of equipment failures) and feedback from an international
questionnaire survey (capturing both technical and human aspect of equipment
failures). Furthermore, factors relevant to controller recovery were identified using both
theoretical and operational findings. These factors, referred to as Recovery Influencing
Factors (RIFs), created a basis for the quantitative assessment of the recovery context.
This Chapter builds on the preceding Chapters and generates real operational data on
controller recovery. These data are further used in Chapter 10 to verify the quantitative
assessment of the recovery context developed in Chapter 8 and the relevance of RIFs
identified in Chapter 7.
9.3 Assessment of the available resources

An assessment of the requirements and necessary resources for the experiment
highlighted the need to perform it either at an ATC Centre or a research institution
appropriately equipped. The critical requirements of the experimental design can be
grouped under two particular categories. These are the access to an ATC simulator
and the availability of licensed controllers. Based on these requirements several
potential locations were assessed:
The Maastricht Upper Area Control Centre (MUAC) in the Netherlands. This is a
EUROCONTROL operational and simulation facility having the resources to support
both access to simulators and controllers;
Human Factors Lab at the EUROCONTROL Experimental Centre (France),
providing access to simulators but not controllers;
The CEATS Research, Development and Simulation (CRDS) Centre in Budapest
(Hungary). This is a EUROCONTROL facility providing access to simulators but not
controllers; and
Various Civil Aviation Authorities (CAAs), air navigational service providers
(ANSPs) and their respective ATC Centres providing access to both simulation
facilities and controllers.
242
Chapter 9
Although the requirements for an experimental plan were ready at the initial stage of
the research, it took two years to gain access to the required facilities. After
considerable negotiations with all potential locations, only one CAA responded
positively and agreed to provide both simulation facilities and staff for this experiment.
Both the pilot and the main study were conducted using their facilities, assistance, and
manpower.
9.4 Planning for the experiment

The review of the relevant literature, presented in Chapter 5, revealed that there is a
lack of detailed knowledge of how controllers perform during unexpected or unusual
situations (including equipment failures). This is partly due to the fact that there is no
relevant data available in the public domain2. This necessitated the design of an
experiment in this thesis to capture and exploit the relevant data.
As a result of close academic cooperation, one European CAA gave Imperial College
London the opportunity to plan, prepare, and run an experiment designed to study the
factors that drive the process that controllers follow to recover from ATC equipment
failures. This experiment was conducted in two phases (see Table 9-1). The first phase
involved a pilot study designed to test the feasibility of the experimental plan including
the appropriateness of the recovery methodology, serviceability of the equipment, and
clarity of the instructions to the participants-controllers working in the ATC Centre. The
results of the pilot study were used to enhance the plan for the main experiment. The
second phase of the study involved the execution of the main experiment where data
was collected for further analysis. A secondary objective was to assess and augment
the existing emergency training procedures as defined by this particular CAA in their
Manual of Air Traffic Services (MATS).
The planned experiments assumed a level of knowledge (on the part of the researcher)
necessary to fully comprehend the recovery process, in terms of the reactions and
actions of the controller in dealing with unexpected equipment failure. For this reason, it
was essential to acquire certain skills before running the actual experiments. To
achieve this objective, practical simulator training was completed by the researcher
prior to the execution of the main experiments (Table 9-1). The scheduled training was
2
Some research was done in the UK National Air Traffic Services (NATS), but was not released
for public use.
243
Chapter 9
preceded by a review of relevant ATC topics in order to prepare efficiently for practical
work on the simulator. The relevant areas covered were ATC phraseology, operational
procedures, equipment, radar vectoring, speed control, level busts, and aircraft
performance.
Table 9-1 Training, pilot study, and experiment sessions
Date
19-20 Feb
2005
26-27 Feb
2005
02 Nov
2005
29 Nov
01 Dec
2005
27 Feb
02 Mar
2006
06 Jun
09 Jun
2006
Phase
Objective
Comment
Planning for the

experiment
Basic training for the ab initio student,

APP training
APP training (arrivals and departures
sequencing, radar vectoring)
Phase I
Pilot study
Total of 10h training on

simulator
Total of 10h training on
simulator
Total of three
controllers participated
Phase II
Main study I
Total of eleven
controllers participated
Main study II
Total of ten controllers

participated
Main study III
Total of ten controllers

participated
9.5 Design of the experiment

Since equipment failures are rare events3 , the experiment aimed to represent failure in
the most realistic form, i.e. as unexpected event. To assure the occurrence of failure as
an unexpected event, each controller participated once in the experiment. The
experiment also assumed a single-controller ACC sector (as opposed to a team of
controllers) to allow best utilisation of available ATC staff and to lessen any logistical
difficulties. Before the experiment, controllers were to be informed of the objectives of
the study in highly generic terms. They were to be given the opportunity to ask specific
questions in the post-experiment debriefing session. Additionally, to assure the
discretion and confidentiality of this study, each participant was to be required to sign a
consent form which incorporated an agreement not to disclose any information
regarding this experiment. In this way, the true objective of the experiment, i.e. the
injection of the unexpected and unforeseen equipment failure, was preserved.
Most of the failures in the ATC environment are prevented or handled at the
technical/engineering level. Only a few failures manage to penetrate multiple redundancies and
fail-safe system design and affect controller performance.
244
Chapter 9
The experiments were to be conducted during morning and afternoon sessions with an
assurance that participants are tested in equal proportion during the two sessions. The
simulation room conditions (lighting, temperature, noise) were to be consistent for all
runs.
Each simulation run was planned to last approximately 30 minutes, followed by a
debriefing session of similar duration. The instant of the injection of equipment failure
was planned to be precisely determined during the pilot study, occurring between the
5th and 15th minute of each run. The equipment failure would last 15 minutes. This was
decided based on two factors. Firstly, operational data shows that the majority of
failures last up to 15 minutes (Chapter 4 section 4.4.6). This has been confirmed by the
questionnaire survey results (presented in Appendix VI). Secondly, the 15 minute
duration of failure represents enough time to observe, capture, and assess the
controller reactions, performance, and overall recovery strategy.
The selection of the equipment failure to be simulated in the pilot study was based on
the results of the analysis of operational failure reports, the qualitative equipment
failure impact assessment tool, and the results of the questionnaire survey. However,
this selection was constrained by the technical capabilities of the available simulation
platform. In other words, it was important to simulate failure as well as the restoration of
the relevant equipment. Thus, the simulator platform would have to provide this
particular capability for a selected failure type. The final decision on the equipment
failure to be simulated would be achieved after testing candidate failure types during
the pilot study. The detailed rationale behind the selection of potential equipment
failures for the pilot and main experiment is given in the following section.
Another important factor of the experiment was the involvement of a Subject Matter
Expert (SME). The role of the SME would be to act as an observer and the coordinator
of the operations room. Upon a request from a controller, the SME would be
responsible for issuing any relevant information about the failure and its effect on the
ATC Centre (as would be required in the operational environment upon receiving an
update from the system control and monitoring unit). Upon restoration of the
equipment, there are several steps that controllers must perform to assure equipment
reliability and hence its readiness for the restoration of normal service (i.e. postrestoration steps). Therefore, additional time would be given to controllers in the postrestoration part of the simulation run, from the 25th to the 30th minute of each run. This
245
Chapter 9
is to restore a normal working strategy after the effects of an unexpected equipment

failure.
Each simulation run would be observed by the researcher and the SME, and recorded
for the purpose of further data analysis. During each simulation run, notes would be
taken on each controllers recovery performance and changes in attitude/behaviour
prior to and after the injection of a failure. This would enable both qualitative and
quantitative data to be captured.
The observation team would be positioned in the most unobtrusive way, still having a
clear view of the radar screen. The simulation runs would be followed by an immediate
debriefing session guided by the questionnaire and other material designed specifically
for this session. The controllers would assess all the factors that potentially influenced
their recovery performance, guided by the RIFs identified in Chapter 7. In addition, they
would be given an opportunity to judge their own performance and the credibility of the
simulated failure.
9.6 Selection of the equipment failure to be simulated

The classification of ATC system functionalities, presented in Chapter 2, identified nine
main categories. The critical subsystems, equipment, and tools were identified in each
category. This categorisation identified the number of components that could fail within
the ATC system architecture. To further assess the characteristics of equipment failure
occurrence, Chapter 4 reviewed some of the main characteristics of failures in terms of
complexity, time course of failure development, overall exposure, and impact on ATC
and ATM operations.
Further assessment of equipment failure types is presented in Chapter 4 and is based
on the detailed analysis of operational failure reports from four different countries. This
analysis shows that equipment failures dominate within the communication, navigation,
surveillance, and data processing functionalities. A subsequent analysis of the level of
severity showed that most failures that have a major impact on ATC operations occur
within
the
communication,
surveillance,
and
data
processing
functionalities.
Furthermore, the availability of the duration variable in one of the datasets (Country
D), enabled identification of equipment failures lasting up to 15min, which is the failure
duration feasible within this experimental set up. Failures with a major impact on ATC
operations lasting for a period of up to 15 minutes include: data exchange network,
246
Chapter 9
other surveillance systems (predominantly radar link), the flight data processing
system, and air situational display (see Table 9-2).
Table 9-2 Overview of the potential equipment failures to be simulated and their inclusion in the
pilot study
Qualitative
equipment Adequacy
Potential
failure
for the
Testing in the
Source
equipment failures
Comment
impact
pilot
pilot study
to simulate
assessment
study
tool rating
It can range from
moderate to minor
Data exchange
Secondary
and the selection
No
network
functionality
tries to focus on
Operational
major failures
failure reports
Other surveillance
Secondary
(selection
systems (e.g. radar
No
functionality
focused on
link)
major failures
Flight data
Primary
Reduced flight
of short
Yes
processing system functionality
plan mode
duration)
Not interesting
Air situational
Primary
enough from the
Yes
display
functionality
controller recovery
perspective
Aircraft radio
Air-ground
Primary
Yes
communication
communication
functionality
failure
Not possible to
simulate failure of
Primary
Primary
Yes
one radar, but only
surveillance radar
functionality
the complete loss of
radar coverage
Flight data
Primary
Reduced flight
Yes
processing system functionality
plan mode
Not interesting
enough from the
Questionnaire
controller recovery
survey
Communication
Primary
No
perspective as the
panel
functionality
controller would
simply change the
position
Not interesting
enough from the
controller recovery
Ground-ground
Primary
perspective as the
No
communication
functionality
controller would try
to establish
communication via
other means
Furthermore, the analyses of the questionnaire survey responses in Chapter 6 (Table

9-2) identified the five most unreliable aspects of ATC equipment. These systems are:
air-ground communication, primary surveillance radar, flight data processing system,
communication panel, and ground-ground communication.
247
Chapter 9
Having these nine possible failure types identified, it was necessary to select candidate
failure types for a final assessment in the pilot study in order to determine the failure to
be simulated in the main experiment. The rationale for this selection was based on the
severity of the failures as determined using the qualitative equipment failure impact
assessment tool (Chapter 4, section 4.5). The development of this tool was based
around the fact that not all equipment failures have the same severity of impact on ATC
operations. This tool identified the failures with the largest impact on ATC operations.
These are failures of the primary ATC functionality, which affect multiple
systems/tools/equipment either suddenly or gradually up to one hour in duration (see
Figure 4-9 and Table 9-2).
The process above, based on operational failure reports, the questionnaire survey, and
the qualitative equipment failure impact assessment tool, identified four potential failure
types. These are the failure of the flight data processing system, air situational display,
air-ground communication, and primary surveillance radar. These four candidate failure
types are further scoped by assessing their significance from the controller recovery
perspective but also their technical feasibility. In other words, the focus was on the
failures which require controllers to recover using only the systems available at their
positions. As a result, the pilot study simulated two different equipment failures. These
were a reduced flight plan mode as a part of the flight data and processing system and
air-ground radio communication failure.
Both failure types also conform to the requirements described in Chapter 5 (section
5.7.3) that the simulated equipment failure should allow one part of the diagnosis
phase of controller recovery to be performed overtly and thus be captured via
observations. For example, the flight data and processing system failure may be
initially thought as aircraft transponder or secondary surveillance radar failure.
Similarly, air-ground communication failure manifests itself in the same manner
regardless of its cause (i.e. ground- vs. airborne-based failure). In both cases, it is up to
the controller to identify the true failure by ruling out alternatives (e.g. communication
with pilot or adjacent ATC Centre) and this diagnostic process can be captured via
observations.
248
Chapter 9
9.7 Pilot study: lessons learnt

Before conducting the main experiment, a pilot study was performed in order to
determine the feasibility of the experimental plan particularly with respect to the
serviceability of the equipment, ease of understanding of instructions, and logistical
issues. The study was designed to match the main experiment as far as possible.
Three controllers, selected at random and with no prior knowledge of the nature and
purpose of the experiment, participated in the study.
The pilot study was conducted on 2 November, 2005. It was part of a pre-planned
simulation, designed to test a newly restructured and reorganised airspace in the Area
Control Centre (ACC) of this particular ATC Centre. Of the three controllers who
participated in the pilot study, one was part of the airspace simulation test programme.
The others were volunteers who participated upon completion of their operational shift.
A total of three simulation runs were conducted. The first run was discarded due to the
inappropriate timing of the injection of the equipment failure.
The set up of the pilot study involved two Controller Working Positions (CWPs), with
the same simulation exercise running simultaneously on both CWPs. The participating
controller was located at one CWP, whilst the researcher and the SME occupied the
second CWP. In addition, a video camera was positioned in front of the second position
so that the controller would not be intimidated by its presence. The pilot study
simulated two equipment failures (Table 9-3) chosen based on the findings from
several sources (as discussed in section 9.6). There were no recovery procedures in
place for the first failure. The second failure has a defined procedure defined by
international aviation organisations (see EUROCONTROL, 2003f; ICAO, 2001a) but
not implemented within the respective ATC Centre.
Table 9-3 Equipment failures used in the pilot study
Type of failure
Reduced flight
plan mode
failure of flight data
processing system
Aircraft radio
communication
failure
Effect
Monitoring aid available only
for flight plan tracks already
displayed
Flight data functions not
available
Inability of the controller to
contact aircraft on the
dedicated frequency as well as
emergency frequency.
249
Existence
of recovery
procedure
Human Machine Interface

(HMI) indication on CWP
No
General Information
Window/Flight Data
Processing (FDP) label
changes from white to
yellow
No (not in
the ATC
Centre)
None
Chapter 9
Several important conclusions were drawn from this pilot study and the lessons learnt
were used to enhance the main experimental design. These are as follows:
Integration of a research experiment into any kind of on-going ATC training requires
significant collaboration with training instructors, the engineer in charge, and an
ATM specialist (SME). In spite of thorough preparation, the injection of failure in the
first simulator run did not occur at the required instant due to the unclear
instructions given to pseudo pilots. This issue was corrected in the subsequent
runs. Therefore, for the main experiment a complete understanding of the set up of
the experiment would have to be ensured between the training instructor, engineer
in charge, pseudo pilots, and the SME in order to avoid any misunderstanding. This
should involve detailed discussions prior to the first simulation run of the day.
The initial intention was to inject an equipment failure in the 25th minute of the
simulation run, in order to give the controller adequate time to adjust to the traffic
scenario. However, the first run showed that this timing was inappropriate for two
reasons. Firstly, the controllers were all very experienced and thus did not require
the proposed length of time to adjust to the traffic scenarios. Secondly, the traffic
scenarios used had a low number of aircraft in the dedicated sector from the 25th
minute onwards. This was contrary to the plan to inject an equipment failure during
the periods of average to high traffic density. Both problems were corrected by
injecting a failure in the 10th minute of the simulation run and observing the
controller recovery process while traffic increased progressively during the 30
minute runs. Since the main experiment was to use fully licensed and experienced
controllers, the exact moment of failure injection would have to be based on the
number of aircraft in the sector. The aim would be to initiate failure with traffic levels
starting with average and then progressing towards high.
The need for access to the simulator log files was identified for the purpose of
capturing all of the inputs of the controller on the keyboard and HMI. The main
purpose for these log files would be to extract the precise reaction time of the
controller following detection of the equipment failure. However, difficulties were
encountered in the acquisition and decoding of these log files. Log files from
simulation platforms tend to have a specific format and level of detail too
cumbersome to decipher. In addition, initial detection may not necessarily be
captured in these log files (as an actual action). This is because controllers may
detect the failures but not take any action until they have evaluated the impact of
the failure on the operation. Having considered all the advantages and
disadvantages of using log files, it was decided to omit them. An alternative was
250
Chapter 9
developed based on the use of a camcorder with a precise timing capability

(synchronised with the CWP timer). In addition, a debriefing session with the SME
was implemented to validate the data captured throughout the recovery processes.
The moment of detection was further validated through the results of the interviews
with the participating controllers in the debriefing session.
The debriefing session revealed that some changes to the questionnaire used in
the debriefing session would be necessary. This would involve amending several
questions to extract more information from the participating controllers (e.g. traffic
and airspace related questions were to be presented in such a way as to extract
more detailed information on precise characteristics such as mix of traffic, vertical
movements, crossing movements, sector design, size of the sector, and number of
entry and exit points.
Due to staff shortage (i.e. ATM experts) and the significant duration of the
experiment (three sessions spread across 11 days), it was not possible to access
two SMEs to observe the performance of each controller.
It was possible to define required recovery steps for a simulated equipment failure
types and thus avoid a level of variability in each simulation run (as a result of
differences in experience, working strategies, traffic complexity at the instant of
failure injection, and inconsistencies in the pseudo-pilot inputs). The required
recovery steps are validated by the SME.
Several issues of a more technical nature were recognised: a need for the use of a
voice recording device in the debriefing stage of the experiment as a more efficient
means of capturing the controller responses, the need for two camcorders or a
combination of one camcorder and radar replay for the debriefing session, and the
need for the use of 8mm tape camcorder instead of digital camcorders due to the
higher resolution achieved in recording and replay.
Another factor of note was that the controllers tended initially to stop their work
when a failure occurred. This was because they felt this was a software
glitch/bugging error, common to real-time simulations. Therefore, the instructions
were to be updated to inform the controllers that in the case of any unusual event
they are expected to continue working as they would in the operational
environment. The experience of ATM specialists showed that although the
controllers may anticipate an unusual occurrence, this does not facilitate a better
handling of the occurrence (for evidence see Appendix II). Therefore, it was
assumed that prior warning of some unusual situation may not alter or enhance
controller recovery performance. It was more important that participating controllers
251
Chapter 9
did not have advance knowledge of the nature of that unusual occurrence, i.e. ATC
equipment failure.
Because of the great amount of data and observations to be collected, it was
realised that the main experiment would require an assistant. The primary task of
the assistant would be to observe and take notes/recordings of the controllers overt
behaviour and attitude.
Finally, although the simulation runs in the pilot study were designed to reflect high
traffic levels, failures were injected during a period of average to low traffic.
Additionally, no adverse weather was simulated, which would add to the complexity
of the exercise. As a result, the traffic scenario in the main experiment would
necessitate high traffic levels from the moment of failure injection throughout the
duration of the exercise. Additionally, adverse weather could be simulated resulting
in the unplanned rerouting of air traffic.
9.7.1 Summary of the findings from the pilot study

As a result of the findings from the pilot study and subsequent discussions with
technical staff and the SME, the following lessons were learnt and used to enhance the
main experimental study:
A complete understanding of all details on the experimental set up has to be
ensured between the training instructor, engineer in charge, and the SME. In this
manner it is possible to provide a consistent injection of failure, adverse weather
conditions, and timely recordings for each simulation run of the main experiment.
This would require detailed discussions prior to the first simulation run of the day.
In the main experiment the failure should be injected in the tenth minute of the
simulation runs, when the traffic reaches average levels and progresses towards
higher traffic levels.
The main experimental set up would require an assistant to observe and take
notes/recordings of the controllers overt behaviour and attitude.
The main experimental set up should be based upon one traffic scenario with
average to busy traffic and adverse weather conditions (pseudo pilots should be
briefed to ask for rerouting due to adverse weather conditions); and
The pilot study tested two different equipment failures. Both failure types showed
the potential for the experiment. However, the flight data processing system failure
was chosen for the main experiment as it is more demanding from the controller
recovery perspective. The failure would be injected as a sudden failure in the tenth
minute of each simulation run and it would last for 15 minutes.
252
Chapter 9
The following section discusses the process adapted to set up the actual experiment
including a description of the characteristics of the simulated airspace, traffic, and
equipment failure type.
9.8 Experimental set up

The main experimental study was conducted in an ATC Centre (different from the one
used in the pilot study) in three separate sessions: from November 29 to December 1,
2005, from February 27 to March 02, 2006, and from June 06 to June 09, 2006 (Table
9-1). The reason for choosing a different ATC Centre to the one used for the pilot
study, was to access a larger population of controllers and required simulation facilities.
There were several differences in the set up of the main experimental study when
compared to the pilot study. The differences are presented in the following paragraphs.
Note that the other design specifications were maintained as given in section 9.5.
The population for this experiment should consist of the controllers from the ATC
Centre where the experiment was to be carried out. The population characteristics to
be sampled in this experiment are age, operational experience (i.e. years in service),
and rating of the controllers. Based on the statistical characteristics of human (i.e.
controller) performance and potential modelling with the normal distribution, the
minimal number of simulation runs (and thus participants) would be 20 (Shier, 2004).
However, collecting a larger sample of controller recovery performance poses a
significant challenge because of accessibility (to both controllers and a simulator
facility) and other logistical problems.
As a result, the study had a total of 31 simulation runs (eleven runs in the first session,
ten runs in the second and third session) performed on the Beginning to End Skills
Trainer (BEST) simulation platform. The main study was conducted in collaboration
with various staff from the ATC Centre. They were: one ATM specialist taking the role
of the Subject Matter Expert4 (SME), technical staff supporting the simulation runs,
several pseudo pilots, and total of 31 controllers. All three sessions were designed to
be as similar as possible in a given ATC environment.
The SME participating in this study is an ATM Specialist with 20 years of experience in many
facets of ATC and has 15 years of experience as an ATC instructor.
253
Chapter 9
As mentioned previously, each simulation run was of approximately 30 minutes

duration, followed by a debriefing session of a similar duration. The experiment
(executed according to the timeline in Figure 9-2) used a pre-planned training exercise
modified for experimental use. After the first simulation run (which was discarded
afterwards), the exercise was amended to reproduce a busier traffic environment. In
other words, several arrivals were accelerated to achieve a busier period from the 10th
to the 25th minute of the exercise. FDPS failure was consistently injected in the 10th
minute of each run by pseudo pilots who manually de-correlated each new radar track.
In addition, pseudo pilots were instructed to simulate adverse weather conditions en
route by asking for necessary rerouting from the controller. Weather conditions were
scheduled for the fifth and fifteenth minute of the run. The FDPS was consistently
restored in the 25th minute of each run (see Figure 9-2).
Figure 9-2 Timeline of the experiment
The recovery process did not end with the restoration of the equipment (the 25th
minute) due to several steps that the controller had to perform to assure equipment
reliability and hence the readiness for the restoration of normal service. It usually took
one minute to accomplish these post-restoration steps. Additional time was given to
controllers in the post-restoration part of the simulation run (from the 25th to the 30th
minute of the run) to restore their normal working strategy and to calm down after the
effects of a highly stressful equipment failure occurrence.
The SME involved in the study as an observer also acted as a coordinator to issue any
relevant information about the failure and its effect on the entire ATC Centre. This
notice was issued in response to queries from the participating controllers. However, if
a controller did not make any attempt to contact the coordinator, the SME issued this
information at the most suitable moment during the exercise (based on the level of the
controllers workload).
Each simulation run was observed by the researcher, the assistant, and the SME; and
recorded for the purpose of further data analysis. The assistant was mainly responsible
254
Chapter 9
for taking notes of the controllers overt behaviour prior to and after injection of failure.
A check-list using the SHAPE5s list of attitudes was used to guide the assistant in
performing this task (EUROCONTROL, 2004f). The assistant was positioned in the
least intrusive way to the controller, completely outside of his/her field of view. On most
occasions, the observation team was positioned as far from the controllers field of view
as possible, whilst still having a clear view of the radar screen. The precise set up of
the simulation room in which the experiment took place and the positions of all parties
involved are depicted in Figure 9-3.
Figure 9-3 Room set up
The simulation runs were followed by an immediate debriefing session guided by the
questionnaire and other material designed specifically for this session. The controllers
were asked to evaluate all the factors that potentially influenced their recovery
performance. In addition, they were given an opportunity to judge their own
performance and the realism of the exercise itself. The questionnaire and other
material designed for the experiment and the debriefing session is presented in the
Appendix XIII.
Equipment failure in ATC, as any other unusual or emergency event, represents a
highly stressful event. In these instances the controllers are required to intervene with
complex strategies and employ their knowledge under significant pressure and high
psychological stress. For this reason, the debriefing session was used to help diffuse
stress by creating a relaxed interview environment where the participating controllers
could evaluate their actions and performance. This session was structured in such a
way as to enable comparisons across the participants. For this reason, a special
5
SHAPE project is briefly explained in Chapter 7, section 7.3.1.3. List of attitudes used to guide
the assistant in the experimental process was derived from SHAPE attitude items, such as
attentive, active, confident, thoughtful, calm, careful, and enquiring.
255
Chapter 9
debriefing sheet had been designed prior to simulation runs. The rationale behind this
structured approach to debriefing was to ensure a consistent and reliable acquisition of
data on controller recovery performance. The debrief segment of the experiment was
used to confirm and detail observations made during the simulation run via an
approach similar to a cognitive walkthrough. In other words, this part of experiment
was used to discuss the sequence of recovery steps required by a controller to
accomplish a recovery, and to validate failure detection and the factors that influenced
each stage of the recovery (i.e. detection, diagnosis, and correction; further discussed
in Chapter 10).
The following paragraphs give a brief description of the key elements of the
experiments in terms of airspace, traffic, and failure characteristics.
9.8.1 Airspace characteristics

The approach airspace of the ATC Centre where the experiment was carried out is
designated as class C airspace. This airspace extends horizontally over a radius of
30Nm from the airfield (runway 06/24, instrument landing system - ILS equipped on
both runway ends). The vertical limits are from the surface to 8,000 ft or FL80.
However, in the case of an early handover from area control, the area of responsibility
of the approach control increases. For example, if an aircraft is handed over at FL180
descending to FL80, all of the airspace in between becomes the responsibility of the
particular approach sector. On a scale of one (adequate airspace) to three
(inappropriate airspace) the participating controllers ranked this airspace as 1.31 on
average, which translates to airspace of adequate to tolerable complexity (Table 9-4).
In addition, a series of in-depth questions on airspace characteristics were presented to
each controller to identify the specific features of this airspace. The most frequently
observed issues with traffic complexity were:
that there were a variety of flight levels and altitudes utilised (from FL100 down to
FL90, 4500ft, 4000ft, 3500ft, 3000ft);
that there were no specific entry and exit points (throughout the duration of this
experiment this particular airspace did not provide for any standard instrument
departure and arrival routes, i.e. SIDs and STARs); and
that the complexity of the neighbouring sectors did influence complexity within the
approach sector they operated in (e.g. two neighbouring sectors have large
numbers of crossing traffic).
256
Chapter 9
Table 9-4 The mapping between exercise characteristics and the controllers observations
The exercise characteristics
The controllers observations
Airspace characteristics simulated as adequate

Weather conditions simulated as unchanged (pre- and
post-failure)
Traffic characteristics simulated as high
Adequate to tolerable
Unchanged
Average to high
In addition, the weather conditions in the exercise simulated 15-25 knots southwest
wind, rain showers, half of the sky covered with cumulonimbus cloud (i.e. thunderstorm
cloud) with base at 1800ft, temperature of two degrees Celsius, and the pressure at
mean sea level (MSL) of 1032 hPa. Generally, in these conditions, icing will occur
inside cloud above 2000ft (in the ICAO standard atmosphere the temperature
decreases on average by 2 degrees Celsius/1000ft). Since the weather conditions preand post-failure injection remained unchanged (i.e. re-routings requested by pilots in
both cases), the overall weather was marked as unchanged. This was confirmed by the
SME and participating controllers (Table 9-4).
9.8.2 Traffic characteristics

The exercise used in this experiment had a duration of 30 minutes and a total of 14
flights (one training aircraft, ten arrivals, and three departures), which translates to 28
aircraft per hour. In the peak segment of the training exercise, the controller was in
simultaneous radio contact with seven to eight aircraft. On a scale of one (high
complexity) to three (low complexity) the participating controllers ranked the traffic
complexity as 1.66 on average. This rating translates to average to high traffic
complexity (Table 9-4). In addition, a series of in-depth questions on traffic
characteristics were presented to each controller to identify the traffic characteristics
mostly observed in the given traffic scenario. These were:
aircraft speed mix or the difference in indicated airspeeds ranging from 125 knots to
250knots (i.e. the speed read directly from the airspeed indicator on an aircraft);
the utilisation of hold and thus induced delays;
only Instrument Flight Rules (IFR) aircraft utilising the airspace;
high volume of traffic with vertical and crossing movements; and
an average flight time in the sector of 10-15 minutes (longer than usual due to the
injected equipment failure).
9.8.3 Equipment failure characteristics

The choice of the equipment failure was driven by the previous analyses and four
different sources of information (operational failure reports, questionnaire survey, the
257
Chapter 9
qualitative equipment failure impact assessment tool, and the pilot study). The FDPS
failure was chosen for this experimental set up for several reasons. Firstly, the data
available showed that this failure is both severe and frequent. Secondly, this failure
represents an example of major failures that affect multiple systems, as seen from the
qualitative equipment failure impact assessment tool. Thirdly, the participating CAA
does not have a written procedure for this particular failure which makes the controller
recovery performance more dependable upon their knowledge, experience, and
personal abilities. Finally, the technical features of the Beginning to End Skills Trainer
(BEST) platform allowed injection of this failure type and its restoration in a fairly easy
way. In order to simulate equipment failure in the most realistic way, it was necessary
to have the ability to inject failure but also to restore system functionality rapidly. This
was possible with the FDPS failure and its degradation was simulated as a sudden
failure affecting the entire ATC Centre for a period of 15 minutes.
A visual representation of this type of equipment failure on the BEST platform is
presented in Figure 9-4. Correlated radar track with all relevant flight-related
information is presented on the left-hand side of Figure 9-4, whilst the uncorrelated
track (resulting from the FDPS failure) depicting only the aircraft position is on the righthand side. It can be seen that the FDPS failure represented a failure which affects
multiple systems. The actual effects of the FDPS failure are presented in the Table 9-5
and in more detail in Table 9-6.
CALLSIGN TYPE
AFL XPT
GS
CFL XFL
ADES
(a)
(b)
Figure 9-4 The visual representation of equipment failure on CWP: a) before the failure, b) after
the failure
Table 9-5 Equipment failure in the experimental study

Type of
failure
Reduced
flight data
processing
mode
Effects
Existence of
recovery
procedure
HMI indication on
BEST simulation
platform
No
None
Monitoring aid only available with existing flight

plans
Flight data functions (flight plan management)
not available
Safety Nets functions available
Radar data functions available
258
Chapter 9
Table 9-6 Availability of functions in the reduced flight data processing mode
Radar data source
Radar tracks
Flight plan track
Maps
Tools
Radar picture controls
Flight plan commands
Flight plan lists
ATC messages de-queue
management
Transmission of ATC messages
Coordination message
Alarm and warning facilities
General information area
Mail box management
Available
Only for flight plan tracks already displayed
Available
Available
Available
Flight plan facilities
Not available
Partially available (for display only, frozen lists)
Operational data management

Sectorisation
Aeronautical Information System
Load management facilities
Air Traffic Flow Management
facilities
Operational load forecast facilities
Current Operational Load facilities
System survey facilities
Operational room configuration
Manual printing facilities
Operator roles (eligibility rules)
Off-line customisation
User mode of ATC position
Repetitive flight plan database
version management
Not available
Not available
Not available
Partially available (no MTCA warnings update)
Available
Not available
Partially available (runway in use and airspace
management are not available)
Partially available (only displayable)
Available
Not available
Not available
Not available
Not available
Partially available (percentage of use of SSR code
indication that a flight plan has received message is
incorrect and alerts are not available)
Available
Available
Available
Not available
9.9 Experimental variables

The following sections define the variables that were taken into account in the design of
the experiment to capture the characteristics of the recovery process in ATC. They are
defined as independent, dependent, and extraneous variables (see Table 9-7 and
Table 9-8) and discussed in the following sections.
Table 9-7 Overview of independent and dependent variables
Independent variable
Set of 20 RIFs
The required recovery
steps
Dependant variable
The recovery context (recovery context
indicator)
The recovery effectiveness
The recovery duration
259
Chapter 9
9.9.1 Independent Variables

There are two sets of independent variables in this experiment. These are the
Recovery Influencing Factors (RIFs) and required recovery steps, discussed in the
following sections.
9.9.1.1 Recovery Influencing Factors (RIFs)
The research carried out in this thesis includes an assessment of the factors that
influence controllers during the process of recovery from equipment failures in ATC (i.e.
RIFs; see Chapter 7). A total of 20 relevant factors (RIFs) were identified. During the
post-experiment debriefing session each participating controller was presented with the
questionnaire. This questionnaire enabled controllers to mark and briefly explain the
influence of each RIF on their recovery performance as experienced in the simulation
run. Although it would be beneficial to question controllers on their experience with the
interactions between RIFs, this would considerably increase the complexity of the
experimental design. Therefore, the statistical approach is taken instead (presented in
Chapter 8).
Table 9-8 briefly summarises each of the 20 factors, specifying the key considerations
taken into account in the design of the experiment. Each factor is defined as either
independent or extraneous variable. Seven RIFs were kept constant for all participating
controllers (Table 9-8), whilst two RIFs were not considered in this experiment (i.e.
adequacy of alarm and adequacy of alarm onset).
260
Chapter 9
Table 9-8 Overview of independent and extraneous variables

Variable
Training for recovery
Previous experience with
equipment failures
Experience with system
performance
Personal factors

Number of workstations/sectors
affected
Independent
variable
Extraneous
variable
Comment
Assessed in the debriefing session.

Existing studies from the nuclear industry have confirmed that communication within a
team does have a significant impact on recovery performance (Kaarstad and Ludvigsen,
2002). Hence, the impact of this factor is fairly well known. Regardless, this variable will
be assessed after the experiment.
Constant
(multiple
systems
affected)
Constant
(sudden
failure)
Constant (all
workstation
affected)
Constant (no
procedure)
Refers to single vs. multiple failure occurrences. The experimental set up should assess
the impact of one failure which affects multiple ATC systems. Therefore this variable will
be constant for all subjects.
This variable varies between sudden failure and gradual degradation of the system. This
variable will be constant for all subjects.
Experiment is conducted on a single workstation with one controller at a time. But the
controller will be informed that the failure affects the entire ATC Centre.
This variable varies between adequate and inadequate time to recover. It can be
influenced by several factors. Firstly, the characteristics of a given failure will drive the
time necessary to recover through the criticality of the failed function and its detectability.
Secondly, the controller characteristics will also have an effect. More experienced
controllers may react and resolve an issue more quickly than less experienced ones.
Finally, the characteristics of traffic at the moment of failure will drive the time necessary
to recover. The more complex the traffic situation, the more recovery time will be needed
to the controller. This variable will be assessed in the debriefing session.
Theoretical review and various experiments in other safety-related industries have
confirmed the relevance of procedures to recovery performance (Kaarstad and
Ludvigsen, 2002; EUROCONTROL, 2004e; Kanse, van der Schaaf, 2000). Therefore, it
was decided to choose a failure which does not have an appropriate recovery
261
Chapter 9
procedure.
Duration of failure
Adequacy of HMI and operational
support
Ambiguity of information
Constant
(short
duration
15min)
In the experimental set up, duration of failure should be long enough to capture all
phases of the recovery (e.g. 15min) taking into account the total duration of experiment.

The experimental design aims to capture controller performance unaided by system
tools, emphasising more controller readiness to detect and react to unexpected
occurrence. Additionally, past research have already shown that in most cases the
existence of an alert does have a significant impact on recovery performance (Kaarstad
and Ludvigsen, 2002; Theis and Straeter, 2001).
Existing studies from various industries have confirmed that the alert onset or its
cognitive convenience does have a significant impact on recovery performance
(Straeter, 2005).
This variable will be kept constant for all subjects. The aim is to reflect the current levels
of traffic as well as the future predicted traffic increase. The declared sector capacity is
defined as the number of aircraft entering the sector per hour, respecting the peak hour
pattern, when controller workload is 70 percent in that hour (Majumdar and Ochieng,
2002). Therefore, the aim of the proposed experimental set up is to use a 30-min peak
hour traffic sample that adequately reflects the sectors declared capacity. In addition,
the scenario should aim at steady traffic increase up to the tenth minute into the
scenario. The remaining 20 minutes of the scenario should reflect higher levels of traffic
as well as controller workload.
This variable will be constant since each participant will experience the same
airspace/sector characteristics. However, each controller will be able to assess the
adequacy of airspace in the debriefing session.
This variable will be constant for all participants. Poor weather conditions will be
experienced both pre- and post-failure period.
Set of required recovery strategy steps will be defined prior to the experiment based on
the type of failure, traffic sample, and airspace characteristics.
Not applicable for technical

reasons
Not applicable for technical

reasons
Traffic complexity
Weather conditions during the
recovery process
Conflicting issues in the situation
Age
Overall experience as a controller
Required recovery steps
Constant
(average to
high)
Constant
262
Chapter 9
9.9.1.2 Required recovery steps

The recovery performance of each participant was compared to the pre-determined set
of required recovery steps. These recovery steps were determined on the basis of
operational experience, since the participating Civil Aviation Authority (CAA) does not
have any official guidelines for this particular failure type (e.g. procedure, written
instruction). This set of required recovery steps was validated by the independent input
of the SME and two ATC instructors. It should be noted that controller performance
was highly dependent upon the traffic situation at the moment of failure and therefore
several different sequences of the recovery steps were possible. The list of the
seventeen recovery steps presented in Table 9-9 presents one logical sequence of the
recovery steps. Whilst some steps had to be performed only once (e.g. identification of
a failure type, informing the coordinator, and post restoration), others had to be reapplied. For example, for each new (uncorrelated) track entering the dedicated
airspace, it was necessary to identify the traffic and maintain that identification. In
addition, timely and accurate strip marking was a must especially in the situation of
degraded equipment reliability, as simulated in this experiment. A detailed evaluation of
strip management and annotations should be addressed in future research.
An important point to note is that these simulation runs were not entirely identical in
spite of the great effort to achieve consistency amongst participants. The observed
differences were due to pseudo pilots manual actions, namely their incorporation of
requested weather rerouting and slight deviations of the moment of failure injection. In
short, pseudo pilots had to manually de-correlate each new track which influenced to
some extent the traffic distribution in each simulation run.
Due to the small differences in the simulation runs, further analysis focused only on the
list of required recovery steps (Table 9-9), irrespective of their sequence. The objective
was to capture these core steps (including the post-restoration steps, S14-S17) and
evaluate any deviations.
Table 9-9 Overview and description of required recovery steps
Required
recovery step
S1
S2
Description
Detect the problem either by pilots contact or visually on the radar display
(detection of the uncorrelated track). In both cases, the first assumption may
be a transponder failure. After confirmation that the aircraft transponder is
operational, further check on ATC system performance should be conducted.
Locate traffic
263
Chapter 9
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
S17
Check identity of eastbound overflight

Identify all traffic using appropriate technique
Bearing/range or
Turn method (turning the aircraft for 30 degrees or more)
Identify failure type (either by controller or by coordinator)
Inform all traffic on RTF of the failure and advise of possible restrictions
Maintain identification of all traffic
Ground the trainer
Refuse departing traffic permission to depart
All airborne traffic in inbound sequence should continue to be sequenced for
landing (without unnecessary delay)
Maintain accurate and timely strip marking throughout the process
Provide vertical separation
Utilise holding patterns when necessary
After restoration has been confirmed by coordinator re-identify all traffic
Confirm Mode C
Continue to monitor
Release all departures (which leads to the restoration of the normal service)
It is important to state the some of the recovery steps above are of greater importance
to maintaining a safe ATC service than others. For example, maintaining identification
of all traffic, conducting timely and efficient strip marking and board management, and
maintaining separation are considered critical to overall safety in a degraded situation.
Other recovery steps, such as grounding the trainer and preventing departures, are of
less importance in that they are workload reduction measures. Nevertheless, their
implementation contributes to a safer traffic environment in unusual situations.
9.9.2 Dependent Variables

This study was designed to capture several quantitative and qualitative dependent
variables. The reason for this lies in the fact that controller recovery cannot be captured
through only one recovery variable as highlighted previously in Chapter 5. The
dependent variables in this experimental set up are recovery context (recovery context
indicator), recovery effectiveness and recovery duration (see Table 9-7). The precise
methodology for the assessment of the recovery context both as a qualitative and a
quantitative variable is presented in Chapter 8. The following sections investigate other
variables.
9.9.2.1 Recovery effectiveness
The recovery effectiveness of each participating controller was rated by combining
three separate sources of data. Firstly, each participants recovery performance was
rated during the simulation run. In general, this analysis was based on the performance
indicators for a particular airspace, such as optimal use of airspace (separation of 58Nm), radar vectoring, speed control, use of radio telephony (RT), prioritisation of
264
Chapter 9
tasks,
and
appropriateness
of
traffic management.
Secondly,
the
recovery
effectiveness was rated based on a set of required recovery steps as explained in

9.7.1.2. Thirdly, the steps identified earlier were grouped under three main tasks to
enable credible rating (see Table 9-10). These are:
System protection or recovery steps which aimed to assure protection of the ATC
system in case of further equipment deterioration. Note that the reduction of
controllers workload through better traffic management is an integral part of system
protection and as such is included in this task;
Maintaining situational awareness (i.e. accurate mental picture of traffic and
airspace); and
Post-restoration recovery steps.
Table 9-10 Recovery process and its three main tasks
System protection task
SA or mental picture task
Post-restoration task
Ground the trainer

Refuse departures permission
to depart
All airborne traffic in inbound
sequence should continue to
be sequenced for landing
Utilise holding patterns when
necessary
Inform all traffic and advise of
possible restrictions
Detect the problem
Re-identify all traffic
Identify failure type
Confirm Mode C
Maintain accurate and timely

strip marking
Continue to monitor
Identify all traffic (including

eastbound overflight)
Release all departures
Locate traffic
Maintain identification of all
traffic
It should be noted that an assessment of controller performance is not a simple task of

counting the number of recovery steps performed versus the total number of required
steps. The reason for this lies in the different effects that each step has on the overall
recovery performance. Therefore, three sources of information enabled a structured
recovery assessment of each participant using the following five categories:
Very good recovery performance (VG) - the controller employed a very good
recovery strategy and all recovery steps;
Good recovery performance (G) - the controller employed a good recovery strategy
but failed to perform some of the steps;
Adequate recovery performance (A) - the controller employed an adequate
recovery strategy but failed to completely protect the ATC system in case of further
equipment deterioration and failed to implement some of the post-restoration steps;
Partially adequate recovery performance (PA) the controller employed inadequate
recovery strategy. In other words, there was a complete lack of ATC system
265
Chapter 9
protection from possible further equipment degradation. In addition, the controller

did not assure timely and accurate strip management and therefore had no means
to support his/her situational awareness or mental picture of the traffic and
airspace. The post-restoration steps were performed only to some basic extent
without a proper check of the accuracy of new data; and
Inadequate recovery performance (I) the controller had no recovery strategy in
place, no plan to reduce his own workload, and therefore, failed to protect the ATC
system in the case of further equipment deterioration. In addition, the controller
failed to implement most of the post-restoration steps.
Although not attempted in this thesis, future research should assess the relevance and
contribution of existing tests such as the situational awareness test SAGAT, to the
assessment of controller recovery.
9.9.2.2 Recovery duration
As previously discussed in Chapter 5, the recovery duration is measured as the time
from the first controller overt action to the end of the recovery process. The
measurement starts from the first controller overt action as opposed to the moment of
actual failure detection although they can differ significantly. Identifying the moment of
the failure detection can be an extremely difficult task as this first reaction usually
represents covert behaviour (i.e. detection) not directly observable. In the current
experimental set up and with the available apparatus, it was not possible to accurately
capture the moment of failure detection but only the controllers first action as observed
on the ATC system.
More sophisticated equipment, such as an eye movement tracker (e.g. ASL Model
501), offers a better, but still not entirely accurate, approach to the discrimination of the
moment of failure detection. The reason for this is that there is no integrated measure
of eye point of gaze and brain activity which would differentiate between fixations with
information gathering and stares, when no information has been gathered6. Therefore,
even with the use of this advanced eye tracking equipment, it would not be possible to
firmly state the precise moment of failure detection. Whilst the moment of failure
Personal correspondence with human factors experts from Netherlands National Research
Laboratory (NLR) and EUROCONTROL Experimental Centre (Human Factors Lab).
266
Chapter 9
detection was investigated during the post-experimental debriefing, it still proved to be

difficult to determine.
For this reason, the research presented in this thesis uses the first controllers action to
measure the recovery duration. It is necessary to highlight that this first observable
action may be postponed for two generic reasons. Firstly, the controller may not
necessarily detect the uncorrelated track as soon as it becomes visible on the radar
display. Secondly, the controller may detect it immediately (upon its presentation on the
radar display) but consciously delay any action due to the workload experienced or the
presence of a more urgent task which needs to be addressed first. For example, the
controller may need to address some of the tasks that are completely unrelated to the
recovery process, namely turning the aircraft to intercept the ILS localiser for the
approach and landing, radar vectoring of the traffic with speed differential. In other
words, the controllers first action is the moment when the controller decides to initiate
an appropriate recovery strategy and not necessarily the actual time when he/she
detects the uncorrelated label. It is well known that controllers develop their own
working strategies concurrently with gaining experience and proficiency with years on
the job. This results in the gradual built up of personal criteria for separation limits and
methods for solving the potential conflicts (whether it is to change speed of the aircraft,
its flight level, or heading).
Based on the moment of the controllers first action, the recovery duration was
determined by observation of simulation runs and recorded video/audio material. It
should be noted that controller recovery performance did not stop with the restoration
of FDPS service, but continued to include all necessary post-restoration steps. The
post-restoration steps are required to restore normal service and to confirm that the
restored functionality provides accurate information. Discussion with the SME revealed
that this stage of the recovery should take up to one minute in duration, simply to limit
the recovery duration for the controllers who fail to perform all post-restoration steps.
As a result, the recovery duration was directly influenced by the duration of the failure
(15 minutes) and the period required for the post-restoration phase (one minute). Thus,
the recovery duration could reach a maximum of 16 minutes only if the controller
immediately initiates recovery action(s). The more time it takes for the controller to
initiate recovery action, the shorter the recovery duration will be.
The results of all three sources of information as well as the final rating for each
participant were confirmed by the one SME involved in the experiment. Clearly, having
267
Chapter 9
the participation of more SMEs would increase the validity of the outcome of the
experiment. Future research should address how statistical representation could be
achieved given the logistical difficulties associated with these types of experiments.
9.9.3 Extraneous Variables

Extraneous variables influence the outcome of an experiment, although they are not
the variables of interest. These variables are undesirable because they add errors to
the experiment. A major goal in the experimental design is to eliminate the influence of
extraneous variables as much as possible. If it is not possible to eliminate them, they
should be controlled. Two extraneous variables in this experiment could not be
controlled. These are:
Operational experience (i.e. years in service)
The differences in the level of experience were to be captured once the controllers are
recruited for the experiment. The experience variable is differentiated between the
following categories: 1-10; 11-20; 21-30; and 31-40 years.
Personal factors
There is a wide variety of factors that could be categorised as personal. Some of these
are more complex to determine than others. For example, factors like health, vision,
level of confidence, complacency, level of trust in automation, self esteem (i.e. trust in
own ability), personality, motivation, attitudes deriving from family or close social group
personality type, etc. require specific sets of tests which can be too complex and too
time consuming. However, age was to be captured once the controllers were recruited
for the experiment. Fatigue and stress were to be controlled by using rested controllers,
similar as time of the day (i.e. relevance of circadian rhythm) and time into the shift
(i.e. level of situational awareness as well as fatigue). In short, the experiment was to
be conducted in the same periods of the day, where half of the subjects were to be
tested in the morning sessions, and the other half in the afternoon sessions.
9.10 Potential limitations

There are two limitations of the experimental set up and its use to capture data. Firstly,
one limitation is the individual differences of the participants (i.e. controllers). These are
characteristics that differ from one participant to another which could be overcome by
using random assignments or even matching groups (to ensure that different groups
are equivalent with respect to pre-selected characteristics (e.g. experience and age).
Secondly, validation of recovery performance of each participating controller by only
one SME creates a potential for bias. Although special attention has been given to the
268
Chapter 9
choice of the SME (in terms of experience and expertise), still only one SME was
available for this experiment.
9.11 Summary
This Chapter has presented in detail the experiment designed to capture controller
recovery in ATC. The Chapter started by justifying the need for the field experiment.
This was followed by an assessment of the available resources and the key
requirements that had to be accomplished. The Chapter continued by discussing and
justifying the overall experimental set up and data acquisition. This included the
presentation of the rationale for the choice of the equipment failures to be tested in the
pilot study. After the lessons learnt from the pilot study, it was possible to implement
the final changes and fine tune the set up of the main experiment. This segment
focused on the characteristics of the simulated traffic, airspace, and equipment failure,
as well as on the research variables while highlighting potential limitations. The
following Chapter analyses the data captured from this experiment.
269
Chapter 10
Analysis of Experimental Results
10 Analysis of Experimental Results
The previous Chapters identified a set of relevant contextual factors or Recovery

Influencing Factors (RIFs) and developed a novel approach for the quantitative
assessment of the recovery context. This approach and its operational benefits are
further verified in this Chapter by an experimental investigation conducted in a training
facility of an Air Traffic Control (ATC) Centre with the participation of 30 operational air
traffic controllers. In addition to the assessment of the recovery context, the
experimental data are used to assess controller recovery performance using the
recovery variables identified in Chapter 5.
The Chapter starts with the overall framework for the analysis of a unique set of data
on controller recovery performance. This is followed by the analysis of the
characteristics of the sample of controllers participating in the experiment. The Chapter
continues with an assessment of controller recovery performance using three recovery
variables, namely recovery context, duration, and effectiveness. It concludes by
focusing on the outcome of the recovery process, as captured in the experiment.
10.1 Overall framework

The objective of the experiment conducted in this research is mainly to capture data
related specifically to controller recovery from equipment failure in ATC. Based on the
experimental set up (presented in Chapter 9), three experimental sessions were
conducted with 30 controllers from a particular ATC Centre who participated on a
voluntary basis. The controllers were asked to complete one emergency training
session (based on a simulated Flight Data Processing System-FDPS failure), followed
by a debriefing session.
The framework for the analysis of data collected on controller recovery from a FDPS
failure is structured according to Figure 10-1. It starts by assessing the characteristics
of the controllers who participated in the experiment. This is followed by a detailed
270
Chapter 10
analysis of the recovery variables defined in Chapter 5, their interactions, and other
relevant findings obtained form the experiment.
Experimental
results
30 operational air traffic controllers

One particular ATC Centre
Simulated Flight Data Processing System
(FDPS) failure
Participants
Age
Ratings
Analyses of
recovery
variables
Analyses of
dependent
variables
Analysis of
interactions
Recovery context
Recovery context
indicator
The recovery
phases
Required recovery
steps
Recovery
effectiveness
Observed
behaviour and
attitude
Recovery
duration
Additional
findings
Other findings
Outcome of the
recovery process
Figure 10-1 Framework for the analysis of experimental results
10.2 Participants
As discussed in section 9.8 (Chapter 9), it is important that statistical representation is
achieved in research that involves sampling of the population. In this case, such
representation is required for the ATC Centre where the experiment was to be carried
out. The main distinguishing characteristics of the controllers are age, operational
experience (i.e. years in service), and rating. This section analyses these and makes a
link to statistical representation.
271
Chapter 10
10.2.1 Age and operational experience

The average age of the controllers who participated in the experiment is 37 years,
ranging from 24 to 58 years. On average, they have more than 12 years of operational
experience, ranging from 2 to 35 years. Figure 10-2 shows the distribution of
operational experience of sampled controllers in terms of the four categories adopted
for the questionnaire survey in Chapter 6. It can be seen that the sample is reasonably
representative of the population of controllers in the particular ATC Centre as all
experience categories have been represented. The under representation of controllers
with over 30 years of experience is to be expected as the majority of the controllers in
this category tend to move to operational support roles (e.g. ATC instructors). This
finding is in line with the results of the questionnaire survey (Chapter 6) where there
were fewer respondents with over 30 years of experience.
Figure 10-2 Distribution of operational experience
10.2.2 Ratings
Figure 10-3 presents the distribution of the ratings of the controllers who participated in
the experiment. Considering that the training exercise was designed for the approach
control course (APP), it is important to highlight that 20 percent of the participants did
not have APP rating. However, half of these participants had ACC rating which
incorporates training in elements of approach control (as a part of the low level ACC
course). Although the remaining participants had only TWR rating, they had just
272
Chapter 10
completed an APP course and therefore possessed knowledge of all relevant elements
of approach control.
40
Percent
30
20
36.7
26.7
10
10
10
6.7
6.7
ACC
APP
3.3
0
All - ACC ACC and
APP TWR
APP
ACC and
TWR
APP and
TWR
TWR
Ratings
Figure 10-3 Distribution of controllers ratings
Since the experiment was conducted in three separate sessions (as discussed in
section 10.1), it is important to investigate whether the sampling on all three occasions
was appropriate. In other words, it is important to show that all three sessions come
from the same population of controllers from the ATC Centre, and that aggregated,
they represent a proper sample (Table 10-1).
Table 10-1 Characteristics of a sample of controllers participating in experiment

Variables
Age (mean, standard
deviation)
Experience (mean,
standard deviation)
1-10
Category of
11-20
experience
21-30
(frequency)
31-40
Experimental session
1
2
3
M=35.9, SD=8.95
M=37.9, SD=10.3
M=37.7, SD=9.73
M=10.7, SD=6.70
M=14.3, SD=11.08
M=13.7, SD=8.22
5
4
1
0
5
2
2
1
4
5
0
1
The Mann-Whitney non-parametric test was used to investigate the differences

between age and operational experience of controllers from the three experimental
273
Chapter 10
sessions. Details of this statistical test are presented in Chapter 6, section 6.7.4. The
statistical tests1 at 95 percent confidence level indicated that there is no difference
between the three experimental sessions (p>0.05). Based upon this, data were pooled
for further analyses.
10.3 Assessment of controller recovery performance

The main objective of the research presented in this thesis is to investigate controller
recovery from equipment failures in ATC. The discussions in Chapter 5 concluded that
the assessment of controller recovery needs to assess the recovery context,
effectiveness, and duration, followed by the assessment of the outcome of the recovery
process. The section continues with an analysis of the interactions between recovery
variables and concludes with the discussion of other relevant experimental findings.

The thesis used a set of RIFs, identified in Chapter 7, to develop a novel approach for
the quantitative assessment of the recovery context through the concept of a recovery
context indicator (presented in Chapter 8). The experiment carried out and presented in
Chapter 9 attempts to verify this approach and its operational benefits. The following
sections adapt the proposed methodology to the particular environment of the ATC
Centre used as a case study. This is achieved in several steps. Firstly, it is necessary
to assess all candidate RIFs and identify those relevant to a particular ATC Centre.
Secondly, the probabilities for each RIF (and its corresponding levels) are defined
based on the controllers input during the debriefing sessions. Thirdly, RIF interactions
are assessed and incorporated. Finally, the recovery context indicator is calculated as
a numerical representation of the context surrounding the simulated FDPS failure and
the subsequent controller recovery. These steps are presented in detail in the following
paragraphs.
10.3.1.1 Assessment of relevant RIFs
This step consists of the assessment of the 20 candidate RIFs and their relevance to
the experiment and the particular ATC Centre involved. Of these RIFs, adequacy of
alarm and adequacy of alarm onset are not relevant since there was no alarm/alert in
the design of the experiment (see Table 9-7, Chapter 9). There are two reasons for
Statistical tests investigated the null hypothesis for experimental sessions 1 and 2, 1 and 3,
and 2 and 3, separately.
274
Chapter 10
this. Firstly, the experiment in this research is designed to capture controller recovery
unaided by system tools, and emphasis is placed on controller readiness to detect and
react to an unexpected failure. Secondly, past research have already shown that in
most cases the existence of an alert does have a significant impact on recovery
performance (Kaarstad and Ludvigsen, 2002; Theis and Straeter, 2001). As a result, 18
RIFs were determined to be relevant to this experiment.
10.3.1.2 Probabilities of each RIF and the corresponding levels
Based on data collected during the post-experiment debriefing session it was possible
to derive probabilities of each RIF and its corresponding levels. The results for all 18
RIFs are presented in Appendix XIV. Furthermore, these probabilities are used to verify
the RIF probabilities defined in Chapter 8 using the verification criteria (Table 10-2). In
other words, a set of expectations was defined before comparing the RIFs probabilities
derived for a generic ATC Centre (Chapter 8) and a particular ATC Centre (used in
the experiment).
Table 10-2 Verification of RIFs probabilities from a generic approach (Chapter 8) and the
experiment
RIF groups
Verification
criteria
Result
Comment
Internal
No
difference
No difference, except
Communication for
recovery
Equipmentrelated
No
difference
No difference
External
Potential
for
difference
No difference, except
Adequacy of
organisation
Airspacerelated
Potential
for
difference
Difference is
observed with traffic
complexity and
overall task
complexity
The controllers who participated in the

experiment rated their communication mostly
as tolerable, compared to the ATM
specialists who rated it mostly as efficient.
The experience with an equipment failure in
the simulated environment may have
indicated some shortcomings in the
communication for recovery to participating
controllers, of which ATM specialists were not
aware of.
Note that the five out of six RIFs in this group
have been controlled in the experimental
design.
The controllers who participated in the
experiment rated the organisation in their
ATC Centre mostly tolerable while the
overall rating from ATM specialists was
mostly efficient. This is a result of the local
ATC Centre characteristics masked within
more generic characteristics captured by
eight ATM specialists.
This is expected as the experimental design
planned for high traffic levels and overall task
complexity (resulting from the simulated
equipment failure)
The expected differences in RIF probabilities are a result of the experimental design
(e.g. traffic complexity and task complexity) and the overall difference in the
275
Chapter 10
populations sampled (i.e. various ATC Centres sampled in Chapter 8 compared to the
ATC Centre sampled in the experiment). In short, the comparison of RIFs probabilities
for a generic and a particular ATC Centre shows similarity.
10.3.1.3 Interactions between RIFs
This step consisted of an assessment and subsequent incorporation of interactions
between identified RIFs, as presented in Table 8-5 (Chapter 8). Based on the
methodology for the quantification of RIFs interactions developed in section 8.4.3 of
Chapter 8, it is possible to determine the coefficient of interaction for the interactions
between 18 relevant RIFs. This coefficient is k=1/(N-1)=1/17=0.059 (where N
represents the total number of relevant RIFs).
10.3.1.4 Recovery context indicator (Ic)
This particular study investigated 18 relevant RIFs, where six RIFs are defined via
three levels of impact and six RIFs via two levels of impact (according to qualitative
descriptors defined in Chapter 7, section 7.3). The remaining six RIFs are defined
through only one level, either because factors were controlled in the experiment or the
participants gave identical answers. For details see Table 10-3 and Chapter 9. In total,
this approach generates 36x 26 = 46,656 possible contexts, each defined through the
corresponding recovery context indicator.
276
Chapter 10
Table 10-3 Summary of RIFs defined through a single corresponding level

Recovery
Influencing Factor
(RIF)
Descriptor
Probability
Level
type
Multiple
systems
affected
Sudden failure
All
workstations
Time course of
failure development
Number of
workstations/sectors
affected
Existence of
recovery procedure
Inappropriate
Duration of failure
Short period of
time
Ambiguity of
information in the
working
environment
External
working
environment
matches the
controllers
internal mental
model
Comment
Simulated Flight Data
Processing System (FDPS)
failure affects multiple systems
The FDPS failure is simulated
as a sudden failure
The FDPS failure is simulated to
affect the entire ATC Centre
The objective of the
experimental investigation was
to simulate failure without
recovery procedure
The FDPS failure is simulated to
last long enough to capture all
phases of the recovery
The controllers responded
positively to the question on
match between external
environment and internal mental
model, although they could not
say that this match was one
hundred percent.
After the calculation of all 46,656 possible contexts it was determined that the mean
value of the Ic is 0.029, ranging from -0.088 to 0.121. The distribution of the recovery
contexts is presented in Figure 10-4. Based on the shape of the Ic distribution, the data
has been fitted with two normal distributions. The result of this fitting is presented in
Appendix XV.
800
700
Frequency
600
500
400
300
200
100
-0
.0
88
-0
.0
78
-0
.0
6
-0 8
.0
58
-0
.0
4
-0 8
.0
38
-0
.0
2
-0 8
.0
18
-0
.0
08
0.
00
2
0.
01
2
0.
02
2
0.
03
2
0.
04
2
0.
05
2
0.
06
2
0.
07
2
0.
08
2
0.
09
2
0.
10
2
0.
11
2
Figure 10-4 Distribution of the recovery context indicator in the experiment
277
Chapter 10
Using the experimental results, the distribution of the Ic derived in Chapter 8 is

assessed using the verification criteria (Table 10-4). In other words, a set of
expectations was defined before comparing the distribution of Ic for a generic ATC
Centre (Chapter 8) and a particular ATC Centre used in the experiment.
Table 10-4 Verification of the distribution of the recovery context indicator obtained from a
generic approach (Chapter 8) and the experiment
Recovery
context
indicator (Ic)
Verification
criteria
Result
Shape
Ic
Mean
Median
Potential for difference as a result of

the local characteristics of a
particular ATC Centre as compared
to a generic ATC Centre
Range
Comment
Shape: the difference is
observed with the left tail
of the distribution
Mean: similar
Median: similar3
Range: similar
The main difference observed is the shape of the distribution in the left tail. This cannot
be explained by the difference in the RIF probabilities as the previous section showed
that they differed for only two RIFs, as a result of the characteristics of the experimental
design. Therefore, it is assumed that the shape of the left tail resulted from the local
characteristics of the ATC Centre used in the experiment (Figure 10-4). Although these
characteristics may have existed in the distribution of Ic obtained from a generic ATC
Centre (Chapter 8), they may be masked by a generic approach.
Therefore, the cause of the deviation in the left tail may be the incorporation of a single
coefficient of interaction between all RIFs, as discussed in section 8.4.3 of Chapter 8.
Although it is known from the operational experience that the RIF interactions do not
have the same level of influence, this thesis had to define a more generic approach to
account for the lack of operational data.
The assumption that a change in the shape of the Ic distribution (in the left tail) is a
result of a single value of the coefficient of interaction, no longer capable of properly
2
A mean value of Ic for a generic ATC Centre is 0.027, whilst for the ATC Centre used in the
experiment is 0.029.
3
A median value of Ic for a generic ATC Centre is -0.023, whilst for the ATC Centre used in the
experiment is -0.026.
4
A range of Ic values for a generic ATC Centre is from -0.069 to 0.131, whilst for the ATC
Centre used in the experiment is from -0.088 to 0.121.
278
Chapter 10
accounting for local characteristic is further assessed on the example of the RIF
Adequacy of HMI and operational support. This RIF is chosen because the interaction
matrix (Table 8-26, Chapter 8) indicates that this RIF impacts on several other RIFs.
Thus the change of its coefficient of interaction may have a significant impact on the Ic
distribution. As a result, the coefficient of interaction relevant to this RIF is increased
from the previous value of k=1/(N-1)=1/17=0.059 (section 10.3.1.3) by factor 10 to the
new value of k=10/(N-1)=10/17=0.59. The resulting distribution of Ic, presented in
800
700
600
500
400
300
200
100
0
-0
.0
8
-0 8
.0
7
-0 6
.0
6
-0 4
.0
52
-0
.0
4
-0
.0
2
-0 8
.0
1
-0 6
.0
04
0.
00
8
0.
02
0.
03
2
0.
04
4
0.
05
6
0.
06
8
0.
08
0.
09
2
Frequency
Figure 10-5, shows the notable change in the shape of the left tail.
Figure 10-5 Distribution of the recovery context indicator in the experiment with an increased
value of the coefficient of interaction
In short, the comparison of the distribution of Ic obtained from a generic ATC Centre
and from the particular ATC Centre shows no difference in the mean, median, and
range, but only in the shape of the left tail. This difference in the shape has been
explained by the inadequate definition of the coefficient of the interaction. As previously
discussed in Chapter 8, more accurate definition of this coefficient will be possible once
a detailed database of human performance becomes available in the ATM industry.
While the controllers responses gave a basis for the definition of the recovery context
indicator (Ic) through each possible recovery context, it was also possible to define
indicators for each controller. In several cases, the participants were not able to select
the corresponding level for several RIFs. For example, in the case of the RIF weather
conditions during the recovery process several controllers were so preoccupied with
the recovery process that they did not pay any attention to the weather conditions.
Therefore, they were unable to select the appropriate level for this RIF. The missing
responses were informed by those available for this RIF. In other words, the missing
279
Chapter 10
responses were replaced with the answer unchanged (corresponding to Level 2)

reported by the majority of controllers. This is also in line with the actual design of the
experiment, where similar weather conditions were presented to the controllers in the
pre- and post-failure period. A similar approach is applied for other missing answers.
Figure 10-6 shows the distribution of recovery contexts for 30 controllers. All values of
the Ic are positive and range between 0 and 0.1. This reflects average or tolerable
environment (values of Ic are close to 0) that has a potential for improvement to
facilitate better recovery from equipment failure.
Figure 10-6 Distribution of the recovery context indicator of 30 controllers
After the assessment of recovery contexts surrounding each controller, the next section
reviews the potential solutions to enhance the recovery context (and thus controller
recovery) using the methodology developed in Chapter 8. In other words, the next
section analyses the sensitivity of the Ic to changes in RIFs.
10.3.1.5 Optimal solutions
In searching for the areas for potential enhancement to improve the controllers
recovery process, it is necessary to focus on RIFs which may be affected at the level of
the ATC Centre. Table 10-5 presents the nine RIFs that could be enhanced, based on
the responses of the controllers who participated in the experiment and the
characteristics of the ATC Centre investigated.
280
Chapter 10
Table 10-5 A review of RIFs with the potential for recovery enhancement
RIFs
Potential for improvement

Internal RIFs

Previous experience
Personal factors
Equipment failure related RIFs
Number of workstations affected
Duration of failure
External RIFs
Adequacy of HMI
Airspace related RIFs
Traffic complexity
Weather conditions
Task complexity
It is important to note that the remaining RIFs are not taken into account for several
reasons. Firstly, in the particular experiment, a number of RIFs attained their most
favourable levels. In such cases, the majority of controllers expressed satisfaction with
the ATC system and expressed no desire for improvement of the particular RIFs.
Furthermore, several RIFs were controlled in the experiment and as such cannot be
changed. These are: complexity of failure type, time course of failure development,
number of workstations affected, and duration of failure. Finally, certain RIFs are simply
not possible to change, such as weather, experience with a particular type of
equipment failure, whilst traffic complexity cannot be influenced at the level of the ATC
Centre. This resulted in total of nine RIFs that have the potential to enhance the
recovery context and thus controller recovery performance (Table 10-4). The next
section illustrates how the improvement of one RIF (existence of the recovery
procedure) could influence the recovery context.
10.3.1.5.1 Impact of enhancing recovery procedure on recovery context
As the participating ATC Centre does not have a recovery procedure for FDPS failure
in place, this factor is chosen as the most practical and effective way of supporting
281
Chapter 10
controllers
and
enhancing
their
recovery
performance5.
Assuming
that
the
management at the ATC Centre implements recovery procedures for FDPS failure, the
existence of recovery procedure RIF would be enhanced from Level 3 to Level 1 and
thus defined as suitable to the situation in question (the probability of Level 1 equals
1.00; Table 10-6). This approach also assumes that all other RIFs remain unchanged
and that any potential impact of this change on other RIFs will be reflected through
identified RIF interactions.
The resulting recovery context would take the mean value of 0.091 (SD=0.0398; Table
10-6). The difference in the distribution of the Ic with and without change in the
recovery procedures has been tested using the non-parametric Mann-Whitney test
(presented in Chapter 6, section 6.7.4). Overall, the baseline recovery context differs
significantly from the recovery context which incorporated the proposed enhancement.
This means that the design of an appropriate recovery procedure significantly
enhances the recovery context and thus creates a better environment for controller
recovery.
Table 10-6 A review of the proposed recovery solutions

Potential RIF for
change
Initial
level
Ic
(M, SD, SE)
Existence of
recovery
procedure
0
0
1
M=0.029
SD=0.036
Level
after
iteration
1
0
0
Ic
(M, SD, SE)
Statistical significance
with 95% confidence
interval
M=0.091
SD=0.039
p<0.001
Sig (U=3E08, z=-196.2)
It has to be noted that the proposed change in the recovery procedure represents only
one possible form of recovery context enhancement. In reality, one ATC Centre may
undertake several other solutions to enhance controller recovery. Furthermore, the
proposed change assumes the definition of the recovery procedure for a particular
equipment failure. Therefore, the calculated recovery context indicator is valid for this
failure type only and it would have to be recalculated for other failure types.
This approach may be used to rate the significance of each proposed change and
compare it with their related cost. However, the evaluation of the related costs, as
opposed to the benefit, is not so straightforward and would necessitate an input from
5
The only available procedures in this ATC Centre are those defined by ICAO. As previously
discussed in Chapter 5, ICAO does not define recovery practice for the FDPS failure.
282
Chapter 10
the specific ATC Centre. Therefore, another approach presented in Chapter 8 may be
utilised to rate the benefit of implemented changes by the calculation of the recovery
context efficiency. The ratio between the value of the current recovery context (mean
value of 0.04; Figure 10-5) and the value of the most positive recovery context feasible
in the particular ATC Centre (i.e. Ic=0.44) indicates that a ten fold improvement is
needed to achieve the most positive value of Ic.
The next section analyses the recovery steps taken by the controllers and their overall
recovery effectiveness.
10.3.2 Required recovery steps

The recovery performance of each participant has been compared to the predetermined set of required recovery steps. Figure 10-7 presents the ratio of recovery
steps performed by each participant to the total number of steps, whilst Figure 10-8
presents the distribution of recovery steps carried out. Only three out of 17 steps were
performed by all participating controllers. These are detection of the problem, location
of traffic, and identification of failure type6.
Percentage of recovery steps performed
100
80
60
Steps not performed
Steps performed
40
20
0
1
3 4
5 6
7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Participants
Figure 10-7 Recovery steps performed by each participant
Note that if a controller did not seek failure-related information from the coordinator, the
coordinator was advised to inform the controller but only after the controller detected the failure.
As a result, the occurrence of this step is inevitable.
283
Chapter 10
30
No. of participants
25
20
15
10
5
0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
S17
Required recovery steps
Figure 10-8 Distribution of required recovery steps (S1 to S17)
Further data analysis shows that on average each controller performed 74.2 percent of
the required recovery steps, ranging from as low as 29 percent to 100 percent. The
most neglected steps were the re-identification of all traffic (S14) and confirmation of
Mode C (i.e. confirmation of the accuracy of the post restoration FDPS data S15).
The post restoration recovery steps of re-identifying traffic and validating Mode C are
important as these steps are considered best practice to ensure system safety in the
aftermath of an FDPS failure. The re-identification process is necessary for two
reasons. Firstly, the identification of traffic is lost whilst aircraft occupy a holding
pattern. Separation in a holding pattern is purely procedural and radar separation does
not apply. Secondly, because of the potential for label swapping and garbling of radar
signals when aircraft are in close lateral proximity (i.e. such as in a holding pattern).
Further investigation of the percentage of the steps performed in three sessions
reveals a significant difference between the first and the third session. The percentage
of the steps carried out in the first session is significantly lower than in the third
session. The relevant statistics are presented in Table 10-7. The percentage of the
performed recovery steps in the first experimental session is on average 64 percent,
increasing in the second experimental session to 77 percent, reaching 82 percent in
the third experimental session (Table 10-7).
284
Chapter 10
Table 10-7 Percentage of performed recovery steps in three experimental sessions

Session
1
2
3
Statistics
M=63.98
SD=21.69
M=77.06
SD=17.64
M=81.77
SD=12.84
Paired sessions
Non-parametric Mann-Whitney test results
1 and 2
p>0.05
1 and 3
p=0.044
Sig (U=23.5, z=-2.0)
2 and 3
p>0.05
After the last experimental session, it was suspected that certain changes had been
implemented in the training of controllers in the participating ATC Centre. The
debriefing session with controllers participating in the third experimental session and
the input from management revealed the incorporation of a compulsory emergency
training module within every rating conversion and continuation training course. This
change was firstly incorporated in the SID/STAR training that started on May 2006. As
a result, several controllers participating in the third experimental session (taking place
in June 2006) benefited from this change. It seems that that this change in training
syllabus led to the increased number of recovery steps performed and the significant
difference observed when compared to the first experimental session.
Statistical tests performed to determine the relationship between the percentage of
recovery steps performed and 18 RIFs, showed that only RIF2 (previous experience
with equipment failures) has a statistically significant correlation. More precisely, the
negative correlation identified (r=-0.31) indicates that controllers who have experienced
equipment failures tend to perform more of the required recovery steps compared to
those who have not experienced failure. In other words, experience with equipment
failures enhances the controllers ability to recover. This finding should be transferred
into the training syllabus of every ATC Centre.

As explained in the previous Chapter, this variable is based on data and information
from three different sources, where each controller is categorised as follows: very good
(VG), good (G), adequate (A), partially adequate (PA), and inadequate (I). The
recovery performance of 43 percent of controllers is rated as partially adequate or
totally inadequate (Figure 10-9). These controllers did not assure ATC system
protection from possible further equipment degradation and did not employ timely and
accurate strip marking and strip board management. Therefore, they had little or no
means of supporting their mental picture of traffic and airspace. The post-restoration
285
Chapter 10
steps were performed only to some basic extent without any proper check of the new
data accuracy. In addition, such a high percentage of inadequate performance
indicates that there is room for improvement throughout the ATC Centre participating in
this experimental investigation. The management of the ATC Centre should implement
solutions to assure a more efficient handling of unusual/emergency situations. Such
solutions could include emergency training on equipment failures, design of recovery
procedures, and regular briefings.
Figure 10-9 Distribution of recovery effectiveness per category (presented via frequencies and
relative percentages)
Comparison of the recovery effectiveness for the three experimental sessions does not
reveal any significant differences (using the non-parametric Mann-Whitney test). In
spite of the implemented change in the participating ATC Centre (i.e. compulsory
emergency training module within the SID/STAR conversion training) and the increase
in the number of recovery steps performed, the effectiveness of the recovery
performance did not differ from one session to the other. This finding confirms that the
rating of recovery effectiveness does not depend on a simple count of recovery steps
performed. This finding further justifies the use of pooled data from all three
experimental sessions. It is an indication of the overall objective achieved with the
execution of those steps but without account of the time frame (recovery duration)
within which the objective is achieved. The combined effect of recovery effectiveness
and recovery duration is assessed in section 10.3.5.
286
Chapter 10

The recovery duration is the time measured from the controllers first action to the end
of the recovery process. During the experiment the first action was identified by the
observation and video recording of each controllers performance, further validated with
the controller (during the post-experiment debriefing session) and the SME. For
example, the time of the first action was the moment when a controller initiated a
search for the uncorrelated track(s), contacted Area Control Centre (ACC) to check on
the uncorrelated track(s) or contacted aircraft to ask for a transponder check (using the
phraseology squawk ident). The end of the recovery process in this particular
experimental design was influenced by the restoration of the failed system and the
performance of the necessary post-restoration steps.
In general, the recovery duration ranged between 12:08 and 15:49 minutes, with an
average duration of 14:38 minutes (SD=0:55). The distribution of the recovery duration
of all 30 controllers per four duration categories is presented in Figure 10-10. These
categories are: 12-13, 13-14, 14-15, and 15-16 minutes. Figure 10-10 shows that 50
percent of controllers initiated the first recovery action within the first minute of the
failure occurrence (and thus their recovery duration lasted between 15 and 16
minutes). The shortest recovery duration is captured in the recovery performance of
two controllers (6.7 percent; Figure 10-10). These two controllers, although initiating
recovery later than the others, implemented an excellent recovery strategy. This finding
highlights that the recovery duration and recovery effectiveness alone are not
appropriate indicators of the overall recovery outcome. To enable a safety assessment
of the recovery performance it is necessary to account for both, as presented in section
10.3.5.
287
Chapter 10
Figure 10-10 Distribution of recovery duration
Comparison of the recovery duration for the three experimental sessions revealed
significant differences. More precisely, the recovery duration in the third experimental
session is significantly longer than in the first two sessions (Table 10-8). This is a result
of the controllers from the third session reacting to the identified failure more promptly
compared to the controllers from the previous two sessions. This may be the result of
the change in the training implemented by the management in the participating ATC
Centre prior to the third session. However, it has to be noted that more prompt reaction
to the identified failure (i.e. longer recovery duration) does not necessarily entail an
effective recovery.
Table 10-8 Comparison of recovery durations between three experimental sessions

Session
1
2
3
Statistics
M=14:15
SD=1:02
M=14:25
SD=0:58
M=15:14
SD=0:18
Paired sessions
Non-parametric Mann-Whitney test results
1 and 2
p>0.05
1 and 3
p=0.031 Sig (U=21.5, z=-2.2)
2 and 3
p=0.014 Sig (U=17.5, z=-2.5)
Non-parametric Kendalls tau tests performed between recovery duration and various
RIFs, reveal four statistically significant correlations. These are presented in Table 10-9
while the details of this test are discussed in Chapter 6. Firstly, the analysis shows that
288
Chapter 10
the recovery duration tends to be longer7 if the last emergency training had a module
on equipment failures. This finding indicates the benefit that emergency training has on
recovery duration (as it prepares controllers to react rapidly to an emergency situation).
Secondly, a similar effect on recovery duration is seen with enhanced communication
for recovery. In other words, if the controllers initiate recovery sooner, they have more
time to adequately communicate the problem to team members or a supervisor.
Thirdly, the existence of adequate recovery procedures promotes prompt recovery
action. This is in line with the finding of the first test. Finally, recovery duration
increases with a decrease in traffic complexity. This is expected as the less demanding
traffic situation allows more prompt action and initiation of the first recovery action
sooner rather than later.
Table 10-9 Statistical tests and results

Variable 1
Recovery
duration
Variable 2
Last emergency training
(module on equipment failure)
Existence of the recovery
8
procedure
Test
Statistical significance at
95% confidence level
p=0.018 (r=-0.39)
The nonparametric
correlation
(Kendalls tau)
p=0.10 (r=-0.39)
p=0.15 (r=-0.41)
p=0.004
(r=-0.46)
Traffic complexity
After assessing both recovery effectiveness and recovery duration, it is realised that
independently they are not appropriate indicators of the recovery outcome, as
discussed in Chapter 5. Therefore, a safety assessment of the overall recovery
performance necessitates the use of both variables combined into the outcome of the
recovery process presented in the following section.
10.3.5 Outcome of the recovery process

The outcome of the recovery process represents the final stage in technical and
controller recovery as previously discussed in section 5.3 of Chapter 5. Since no
technical recovery was taken into account in this experiment, the outcome of the
More prompt first recovery action by a controller is representative of the longer recovery
duration.
8
There is no recovery procedure for the simulated equipment failure in the participating ATC
Centre, but some controllers stated that they had experienced similar failures as part of their
initial simulator training. Discussion with the subject matter expert revealed that this particular
equipment failure is not simulated in any training syllabus.
289
Chapter 10
recovery process focuses solely on the outcome of controller recovery. This is defined
as a combination of two recovery variables. Firstly, recovery effectiveness that
accounts for recovery steps carried out by a controller and achievement of the three
key objectives (i.e. ATC system protection, maintenance of situational awareness, and
adequate post-restoration steps). Secondly, recovery duration accounts for the time
frame in which these steps were performed. In line with the discussion in Chapter 5,
the outcome of the recovery process is accounts for successful and unsuccessful
recovery. An additional category for tolerable recovery outcome is also defined in this
thesis (Table 10-10).
Table 10-10 The outcome of the recovery process matrix applicable to the experimental set up
presented in this thesis (S stands for successful, T for tolerable, and U for unsuccessful
recovery)
Recovery
Effectiveness
Recovery duration (minutes)

Very good
Good
Adequate
Partially adequate
Totally inadequate
12-13
T
T
U
U
U
13-14
T
T
T
U
U
14-15
S
T
T
T
U
15-16
S
S
T
T
T
The recovery outcome matrix highlights that successful recovery requires the initiation
of the recovery process within the first two minutes from the instant of the failure
occurrence and the performance of the majority of the recovery steps (assuring
achievement of all three objectives). An unsuccessful recovery is a result of a controller
failing to achieve two or more key objectives while initiating the recovery after more
than one minute from the instant of the failure occurrence. The delayed first recovery
action leaves the ATC system completely unprotected. Therefore, the temporal
requirements for the unsuccessful recovery account for three categories of the
recovery duration variable (Table 10-10). Everything outside the scope of the
successful and unsuccessful recovery is considered tolerable. The above discussions
are only applicable to this experimental time frame and setting, and are extracted
based on operational experience, with a further validation by the SME.
Based on the presented categorisation, the outcome of the recovery process for
controllers who participated in the experiment is mostly tolerable (Figure 10-11). This
finding again confirms that there is room for improvement of the recovery performance
in the ATC Centre used in this experiment.
290
Chapter 10
Figure 10-11 Distribution of the recovery outcome
After assessing all recovery variables, the next section identifies any relevant
interactions between them.
10.3.6 Interactions
This section investigates the level of interactions between the recovery variables using
statistical testing (previously discussed in Chapter 6). Table 10-11 presents the results.
Table 10-11 Statistical tests and results

Variable 1
Recovery context
indicator
Recovery
effectiveness
Recovery duration
Variable 2
Recovery
effectiveness
Outcome of the
recovery process
Outcome of the
recovery process
Outcome of the
recovery process
Test
Statistical significance at 95
percent confidence interval
p=0.06, r=0.329
Nonparametric
test (Kendalls
tau)
p=0.017, r=-0.36
p=0.01, r=0.57
p>0.05
Non-parametric Kendals tau statistical tests indicated three significant relationships

(Table 10-11). Firstly, a statistical test indicates a relationship between recovery
effectiveness and recovery context indicator at the 90 percent confidence level
(p=0.06, r=0.32). Furthermore, the Mann-Whitney non-parametric test shows the
Statistical significance at the 90 percent confidence interval
291
Chapter 10
relationship between recovery context indicator for the combined category of very
good and good recovery effectiveness on one side and partially adequate and totally
inadequate on the other (at the 90 percent confidence interval, p=0.065). Secondly, a
statistical test indicates a significant relationship between the recovery context indicator
and the outcome of the recovery process at the 95 percent significance level (p=0.017,
r=-0.36). In other words, the higher values of the recovery context indicator enhance
the outcome of the recovery process or the recovery success. Finally, a statistical test
indicates a significant relationship between recovery effectiveness and the outcome of
the recovery process. In other words, the greater controller recovery effectiveness the
more successful is the overall recovery. All findings are in line with the operational
experience.
10.3.7 Other findings

In addition to the findings above, the following points are worthy of note. These are
presented, firstly by considering the phases of recovery and the corresponding
influencing factors. Secondly, by considering the behaviour and attitude of the
controllers, as the simulated failure was unexpected. Finally, additional findings related
to controller recovery of relevance to the management of the particular ATC Centre and
the wider aviation community are presented also.
10.3.7.1 The recovery phases
The following paragraphs provide a review of the three distinct recovery phases as
explained in Chapter 5, section 5.2. This review focuses on the factors that influenced
controller recovery performance in each phase.
10.3.7.1.1 Detection
In the simulated runs, detection, or recognition that there is something unusual in the
ATC system, was determined by several factors. The most prominent factor was the
pilot's first contact with ATC. There were two flights entering the approach sector
simultaneously following failure injection. Depending on the pseudo-pilots workload,
either of these aircraft could contact the controller first. At the moment of the first
contact the flights were still outside of the controllers area of responsibility (some
40Nm away from the airport10) and controllers were sufficiently busy in the vicinity of
the airport providing approach control service. As a result, the aircraft were usually
10
Note that the display range in this experiment was set to 30Nm for each controller.
292
Chapter 10
asked to standby for radar identification. In the case of late contact by the first
uncorrelated track (once the track is almost visible on the radar screen or at about
35Nm from the airport), controllers searched for the track and detection of the problem
was then immediate. The common factors that influenced the detection phase of the
recovery process in this experiment were determined based on observations, video
recordings, and debriefings. These are as follows:
The first radio contact (RT) of uncorrelated track;
Traffic complexity and related level of controller workload at the moment of contact;
Display range (set at 30Nm for this experiment);
Type of the equipment failure (uncorrelated tracks were immediately visible on the
screen once within radar range); and
Complexity of failure type (affecting single or multiple equipment simultaneously).
It should be noted that the same set of factors also affected the instant of the first
recovery action. The reason is that detection is a prerequisite for the first recovery
action.
10.3.7.1.2 Diagnosis
In this experiment, after the detection of one uncorrelated track, the controllers first
assumption was usually aircraft transponder failure. This prompted a request to the
pilot to squawk identification on the secondary transponder (i.e. to operate the
designated Mode A code on the primary/secondary transponder). When this check did
not produce a correlated track on the radar screen further checks were necessary. At
this stage, the second aircraft was usually well inside the radar display range also in an
uncorrelated state. At this point, it became obvious to the controllers that they were
experiencing some form of equipment failure and they sought information from the ATC
Centre coordinator as to the nature of the failure. The possible options were failure of
secondary surveillance radar or FDPS failure. SSR failure was discounted as soon as
the mix of correlated and uncorrelated tracks was visible. The final option was FDPS.
The coordinator was instructed to announce that it was FDPS failure affecting the
entire ATC Centre. Moreover, he also emphasised that flight plan tracks would remain
correlated only for tracks already displayed, while all other tracks entering the system
will appear uncorrelated. The common factors that influenced the diagnosis stage of
the recovery process in this experiment were determined based on observations, video
recordings, and debriefings. These are as follows:
The number of uncorrelated tracks observed on the radar display;
Input by the coordinator;
293
Chapter 10
Type of equipment failure; and

Complexity of failure type.
10.3.7.1.3 Correction
In the exercised traffic scenario, the correction phase consisted of the identification of
all traffic using an appropriate primary radar technique. There are a number of
available techniques to identify traffic. Those chosen by the controllers in this
experiment were confirmation of bearing/distance of the aircraft from a fix and the turn
method (turning a singe aircraft by 30 degrees or more to ascertain positive radar
identification). Operationally, the bearing/range technique is considered to be more
effective and expeditious, as it avoids misidentification due to simultaneous turning of
more than one aircraft. The next step in this process would be to inform all traffic of the
exact nature of the equipment failure and to advise them of possible consequences
(i.e. restrictions and delays). This would be followed by restricting any sport/training or
non-commercial aircraft, refusing departures permission to depart, and utilising the
holding pattern for all arrivals. If the failure was persistent (in this experiment it lasted
15 minutes), the controllers had to think of the steps to assure system safety in the
case of further deterioration of the equipment reliability. Thus, they had to provide
vertical separation and preserve the highest level of situational awareness. This should
be achieved by maintaining accurate and timely strip marking and strip board
management11. The common factors that influenced the correction stage of the
recovery process were determined based on observations, video recordings, and
debriefings. These are as follows:
Traffic complexity;
Existence and familiarity with the recovery procedure(s);
Duration of failure;
Type of equipment failure; and
Complexity of failure type.
Figure 10-12 links the key characteristics of each recovery phase in this particular
experiment with the recovery steps relevant for each phase.
11
The debriefing sessions investigated the overall quality of strip management and annotation
without going into a more detailed analysis. In future, the structure of the debriefing session may
place more emphasis on this segment of the recovery process.
294
Chapter 10
Figure 10-12 Recovery phases, their corresponding influencing factors and required recovery
steps
10.3.7.2 Observed behaviour and attitude

As discussed in Chapter 9, all the observations of the controllers attitude and
behaviour were captured by the assistant. A check-list using the SHAPEs list of
attitudes was used as an initial tool and guidance to the assistant in performing this
task (see EUROCONTROL, 2004f). In addition, some of the observations were
captured during the debriefing sessions.
In general, the observations in the first two experimental sessions show a difference in
overt behaviour in the pre- and post-failure segment of the experimental investigation.
In line with the results obtained with other recovery variables, the analysis of the
relevant data on controllers participating in the third session did not reveal significant
changes in overt behaviour in the pre- and post-failure segment of the experiment.
Furthermore, the findings from the first two sessions are in line with the previous
findings on the consequences of stress on individual controllers (Costa, 1995). Whilst
for some controllers the overall posture remained the same throughout the exercise,
295
Chapter 10
others displayed the complete opposite. The deviations from the pre-failure behaviour
involved the following:
increased movement (i.e. overall posture, hands, feet, or head);
forceful displacement of the strip holders;
deviations from standard RT phraseology;
hesitation in RT communication; and
change in pitch or tone of voice.
The subject matter expert involved confirmed that most of these behavioural gestures
depict a typical reaction to a reduced mental picture of either the traffic or overall
situational awareness. Even during the debriefing stage of the experiment, the change
in the controllers behaviour was noticeable for the first two experimental sessions.
Examples include shaky voice, overall unease, high alertness, and seriousness. The
controllers who performed the recovery process at either tolerable or good levels were
noticeably more relaxed and talkative. On the other hand, the controllers who
performed at either partially adequate or inadequate levels were without exception
more nervous and reluctant to answer questions in detail, and carry out an objective
review of their own performance. The overall conclusion is that the equipment failure
was an unexpected event and contributed to a significant increase in the controllers
workload (as reported subjectively by the participating controllers).
10.3.7.3 Additional findings
It is important to present all acquired findings as they represent important issues for the
management of the participating ATC Centre as well as the wider aviation community.
These are presented in the following paragraphs.
Although 73 percent of the controllers reported that their training was suitable to the
equipment (i.e. FDPS) failure and traffic scenario in question, analysis of data collected
in the experiment showed that for 43 percent (of the 73 percent) received the last
emergency training more than a year prior to the experiment12. From the controllers
who were able to recall, 50 percent stated that the emergency training session they
participated in had a module on equipment failures, predominantly on radar failures.
However, it was also noted that 40 percent of the controllers did not have any type of
equipment failure in their last emergency training. As a result, 93 percent of controllers
12
Note that 27 percent of controllers had their last emergency training in the month prior to this
experiment, as a part of the approach rating course.
296
Chapter 10
who participated in the experiment reported they would like to have more frequent
training for unusual situations. The most desired frequency of emergency training
sessions was every six months. This is in line with the findings obtained in the
questionnaire survey (Chapter 6) where 45 percent of controllers believe that recurrent
training once a year is not enough to develop and maintain the level of proficiency
required for recovery from equipment failures.
Interesting results were obtained on the question on the existence of a recovery
procedure for the simulated FDPS failure. Although the procedure for this kind of failure
does not exist in the Manual of Air Traffic Services (MATS), 20 percent of controllers
believed that this particular procedure does exist. Some of the controllers, who had
participated in the approach control course, quoted their training manual as the
reference for this procedure. However, no evidence was found to support their
statement. The best explanation for this is that these controllers identified Secondary
Surveillance Radar (SSR) failure with FDPS failure and relied on their recent radar
fallback training, without fully understanding what the implications of the loss of FDPS
are. The outcome of FDPS failure is significantly different from simple SSR failure, as it
represents a more serious failure that requires immediate attention from the controllers
with the required skills.
On the issue of Human Machine Interface (HMI) and operational support (e.g. auxiliary
display, communication panel) 46.7 percent of controllers found the Beginning to End
Skills Trainer (BEST) simulator platform suitable to the equipment failure and traffic
scenario in question, 36.7 percent found it tolerable, while ten percent found it counter
productive. 6.7 percent of the controllers did not respond to this question. However,
most of the controllers stated that the BEST platforms HMI is not as good as the HMI
used in the operational centre. There are two reasons for this. Firstly, meteorological
data needs better positioning (i.e. closer to the screen) to avoid head turn and change
of visual field and secondly, a lack of alert or warning that a failure has occurred (i.e.
colour change to yellow or red in the general information window).
Several organisational issues were raised during the debrief sessions. The most
frequent issues raised were that controllers:
felt that supervisors should receive more dedicated training in the handling of
unusual occurrences and system failures. Their role in coordinating recovery
actions should be more proactive. In addition, it was highlighted that coordination
297
Chapter 10
with technical services and adjacent ATC Centres should be the primary
responsibility of the supervisor during a Centre crisis;
felt that more emphasis could be placed on developing an understanding of the
separate roles of both controllers and engineers. This perceived lack of
understanding of each peer groups function and tasks can create communication
difficulties in the operational environment;
identified a need for an update of the MATS with regard to the on suite task
allocation between the executive and planning controller. Additionally, controllers
stated that the last three incidents involving a loss of standard separation involved
team related issues that contributed to the events. Therefore, it is necessary to
strengthen the relationship between executive and planning controllers and to
define their precise roles and responsibilities;
stated that their roles as currently defined in MATS are ideal but in reality are
difficult to adhere to, especially in a busy operational environment. They further
stated that in the event of an unusual occurrence, there are no guidelines available
for the handling of such situations;
stated that competency checking, conducted once per year for only one hour, is not
sufficient. They also stated that the availability of refresher training in unusual
occurrences is also limited to once per year. One again, this finding is in line with
the questionnaire survey results presented in Chapter 6.
In general, the participating controllers rated their own performance between efficient
and tolerable (47 percent rated their own performance as efficient and 50 percent as
tolerable). This is not in accordance with the overall assessment of their performance
(recovery effectiveness) where 43 percent of the controllers performed at the partially
adequate and inadequate levels. This should pose some concern especially
considering that 46.7 percent of controllers stated that their performance in this study
was no different from any other day. In addition, 45 percent of them marked their
performance as highly representative of their overall ability to recover from an
equipment failure in ATC. Finally, 70 percent of controllers stated that the task they
experienced in the experiment was highly realistic.
Furthermore, 33 percent of the controllers stated that they were not aware of the
complete impacts/implications of a particular failure or equipment failures in general. As
a result, 87 percent of the controllers stated that they would like to have some form of
aide memoire available at each CWP to assist them in recognising the effects of a
particular equipment failure and steps to be taken to recover. As a consequence this
298
Chapter 10
thesis proposes a framework for the establishment of an aide-memoire (in Appendix

III). A summary of all additional findings is presented in Table 10-12.
Table 10-12 Summary of additional findings

Variable
Finding
Comment
73 percent reported that their training

was suitable
Training
Trust in ATC
technology
Recovery
procedure
HMI
Overall recovery
performance
Awareness of the
impact of a
particular failure
Availability of
aide memoire
93 percent of controllers would like

more frequent training for unusual
situations
93 percent of controllers have an
objective attitude toward ATC
equipment
20 percent of controllers believe that the
procedure for FDPS failure exists
46.7 percent of controllers found the
BEST platform suitable to their needs
and only 10 percent found it counter
productive
47 percent of controllers rated efficient
50 percent of controllers rated tolerable
Majority of these controllers had

the last training on unusual
situations more than a year ago.
Only half of the respondent had
an equipment failure.
The procedure does not exist in

the ATC Centre
Negative comments are mostly
related to the differences
between BEST platform and the
system used in the operations
room
Not is accordance with their
overall performance. 43 percent
of controllers were rated partially
adequate or inadequate.
33 percent of controllers is not

completely aware
87 percent of controllers is in favour
A framework of aide memoire is

provided in Appendix III
10.4 Summary
The Chapter set out to achieve several objectives. Firstly, it set out to verify a
methodology for the quantitative assessment of the recovery context (defined in
Chapter 8) and its operational benefits. Secondly, it set out to verify a framework for an
in depth analyses of controller recovery using recovery variables previously identified in
Chapter 5. The final objective set out to assess the outcome of the recovery process.
All these objectives have been achieved by the experiment and several interesting
findings have been produced. These are as follows:
The majority of controllers tend to omit some critical recovery steps related to the
post-restoration phase. These are re-identification of traffic and confirmation of
the accuracy of information provided by the restored equipment. The sampled
controllers seemed to rely on the information provided without questioning its
accuracy following the occurrence of a failure.
299
Chapter 10
Controllers with prior experience of equipment failures tend to carry out more
recovery steps compared to those without prior experience. In other words,
experience with any equipment failure tends to enhance the controllers ability to
deal with equipment failures. Moreover, this type of stress-exposure training
enhances the stress-coping skills of controllers and as such should be
incorporated into the training syllabus of every ATC Centre.
A high percentage of inadequate recovery performance indicates that there is
room for improvement throughout the ATC Centre participating in the experiment.
Hence, the ATC Centre management should implement solutions to assure
efficient handling of unusual/emergency situations. Note, however that the
management of the ATC Centre where the experiment took place implemented
an initial process to train controllers to deal with unusual/emergency situations.
This was in the form of a compulsory emergency training module within every
rating conversion and continuation training course.
The first recovery action tends to occur more promptly if a controller has had
training for unusual/emergency situations.
If the controllers initiate recovery sooner, they communicate better with team
members and the supervisor.
The existence of adequate recovery procedures tends to promote prompt
recovery action.
Recovery duration tends to increase with a decrease in traffic complexity. This is
expected as the less demanding traffic situation allows the controllers to initiate
recovery action sooner rather than later.
The outcome of the recovery process variable has been defined as an overall
safety indicator of the recovery process. It represents a combination of the
recovery effectiveness and duration.
The recovery context indicator represents a good indicator of both recovery
effectiveness and the outcome of the recovery process.
Recovery duration itself is not a good indicator of the outcome of the recovery
process, whilst recovery effectiveness is.
The framework for the analysis of controller recovery proposed in this thesis and
verified in the operational environment, shows a potential for an in depth analysis
of controller recovery from equipment failures in ATC.
300
Chapter 11
11
Conclusions
Conclusions
This Chapter presents the main findings of the research on controller recovery from
equipment failures in Air Traffic Control (ATC) and suggests avenues for future work.
The approach taken for the former is to address each of the research objectives
formulated in Chapter 1 (repeated below for ease of reference) and to present the
corresponding findings. The Chapter concludes with the identification of research
questions and ideas to be explored in future research.
11.1 Revisiting the research objectives

Chapter 1 defined a set of four research objectives for this thesis. These are to:
Provide a systematic literature review to connect disparate but related topics of
ATC equipment failures and controller recovery, previously lacking in the area of
ATC;
Identify potential equipment failure types and their characteristics;
Identify contextual factors that affect controller recovery performance and derive a
methodology to quantitatively assess recovery context; and
Propose a framework for the analysis of controller recovery. This framework should
be further verified with specific reference to a particular equipment failure type.
11.2 Conclusions
11.2.1 Literature review
The review of relevant literature aimed to connect ATC equipment failures with both
technical and air traffic controller recovery. With respect to the literature review, the
following conclusions are relevant:
1. The assessment of controller recovery from equipment failures in ATC has to
address technical and controller recovery together and not in isolation as has
been the case in the past. This holistic approach enables a complete
understanding of controller recovery and all of its influencing factors.
301
Chapter 11
Conclusions
2. Because of the variety of equipment, components, and tools in both current and
future ATC system architectures, ATC equipment should be classified based on
the type of ATC functionality it supports. Such a functional classification is
flexible to changes in ATM/ATC and can capture both current and future
equipment failure types.
3. Recovery procedures, recovery training, and past experience with equipment
failures are the main drivers of controller recovery performance. However, the
provision of both recovery procedures and training is inconsistent, across ATC
Centres.
4. The context in which controller performance takes place has an important role
in controller recovery.
11.2.2 Equipment failure types and their characteristics

Equipment failure characteristics were determined from past research and operational
experience through the analysis of operational failure reports and responses from a
questionnaire survey of air traffic controllers. With respect to equipment failure
characteristics, the following conclusions are relevant:
5. The key characteristics of ATC equipment failure are: ATC functionality
affected, complexity of failure type, time course of failure development, duration
of failure, potential causes of equipment failure, and the consequences of
equipment failure.
6. Information on equipment failure characteristics has been used to develop a
novel qualitative equipment failure impact assessment tool. This tool enables
the identification of equipment failures that are most challenging to ATC
operations.
7. Communication, surveillance, and data processing ATC functionalities are
affected most by equipment failures and have the most severe impact on ATC
operations. This finding has been verified by operational failure reports and the
results of the questionnaire survey.
8. According to operational failure reports further verified with the results of the
questionnaire survey, equipment failures that have a major impact on ATC
operations mostly affect the air ground communication, radar surveillance
coverage, and the Flight Data Processing System (FDPS).
9. According to operational failure reports, the most frequent equipment failures
last up to 15 minutes. Furthermore, analysis of the reports has shown that the
302
Chapter 11
Conclusions
longer the failure, the less severe it is. This finding is expected as more severe
failures are attended to immediately.
The conclusions listed above, resulting from the investigation of equipment failure
types and their characteristics in the operational ATC environment, have the potential
to impact policy formulation and the operational aspects of ATC/ATM. The thesis
findings have highlighted, for the first time, the ATC functionalities that are most
affected by equipment failures as well as those which have the most severe impact on
ATC operations. These use of the findings are twofold. Firstly, to identify the equipment
failure types mandatory for recovery training/procedures designed for an ATC Centre.
Secondly, the qualitative equipment failure impact assessment tool can be used as a
part of the incident investigation process as well as a design tool, supporting the design
of recovery training scenarios.
11.2.3 Controller recovery performance, recovery context, and

influencing factors
The main findings related to controller recovery performance and the recovery context
are drawn from two sources of information. Firstly, the questionnaire survey results
provided an initial insight into controller recovery and relevant factors. Secondly, a
review of several Human Reliability Assessment (HRA) techniques identified a set of
relevant contextual factors, the so-called Recovery Influencing Factors (RIFs). With
respect to controller recovery and the overall recovery context, the following
conclusions are relevant:
10. This thesis presents for the first time, a comprehensive investigation of the
factors that influence controller recovery. This has been done through a
rigorous process that started with relevant past research, a questionnaire
survey, targeted experiments, and statistical analyses to develop a functional
relationship between controller recovery and its influencing factors.
11. The questionnaire survey showed that the majority of controllers experience
equipment failures annually.
12. Improvement in ATC Centre management is required to facilitate effective
recovery. This can be achieved through, for example organised exchange of
experience within ATC Centres, not only with respect to equipment failures but
also with all types of emergency/unusual situations. Statistical tests identified
that controllers account for exchange of information regarding equipment
failures as a type of past experience.
303
Chapter 11
Conclusions
13. The questionnaire survey showed that the vast majority of ATC Centres
surveyed have some form of recovery procedure. The most neglected
procedures are for ATC functionalities which are most challenging to controller
recovery (data processing, surveillance, and communication functionalities). In
addition, controllers highlighted the need for an abbreviated version of the
contingency manual which should be made available at each controller working
position (i.e. aide-memoire).
14. Recovery procedures should be up-to-date, complete, and follow a logical
sequence of steps that the controllers should perform. In addition, recovery
procedures need to be compatible with other procedures within the ATC Centre.
In short, procedures should be seen as guidance to the controller, they should
be adaptable to any given situation, and should take account of a variety of
contextual factors.
15. Half of the ATC Centres surveyed in the questionnaire survey have
programmes for training in recovery from equipment failures. However, this
recurrent training is usually provided once a year. The controllers believe that
the frequency of recurrent training is inadequate and are in favour of receiving
as much training as possible on emergency/unusual situations, including
equipment failures.
16. Recurrent training must be up-to-date and compatible with other training
programmes. Moreover, the recurrent training exercises should be varied and
realistic covering both outages and less severe failures. The ATC Centre should
adopt a custom of periodically reverting to backup systems in order to maintain
controllers proficiency with their usage, perhaps during less busy traffic
periods.
17. Regular training on system functionalities, upgrades, and degradation modes
could be a useful method to ensure consistent knowledge and familiarity with
the ATC system architecture.
18. The majority of controllers surveyed confirmed the importance of context
surrounding an equipment failure occurrence. This confirmed the earlier finding
from existing research literature.
19. The context surrounding controller recovery from equipment failure in ATC is
defined via 20 contextual factors, known as Recovery Influencing Factors
(RIFs). Each RIF can be further defined via its qualitative descriptor. This
establishes the relationship between each RIF and its influence on controller
performance.
304
Chapter 11
Conclusions
20. An aggregated indicator of the entire recovery context has been proposed,
referred to as recovery context indicator (Ic). This quantitative indicator of the
recovery context is sensitive to changes in the individual RIFs.
This thesis presents for the first time, a comprehensive set of the factors that influence
controller recovery (RIFs). These factors can be used as part of an incident
investigation process, enabling a detailed investigation of the impact of context on
controller recovery performance. The identification and assessment of RIFs can also
be used for the identification of recommendations on various aspects of ATC operation
and their refinement. However, the final decision of the optimal recommendation should
be based on the degree of positive shift in the value of the recovery context indicator
(as the quantitative indicator of the recovery context). Within the future ATM system,
this methodology could be easily modified to account for the shared responsibility of
separation of aircraft and collaborative decision-making between airborne and ground
based ATM system components.
11.2.4 Framework for the analysis of controller recovery

The framework for the analysis of controller recovery proposed in this thesis was
verified in an experimental investigation with specific reference to a particular
equipment failure type (i.e. FDPS) and a particular ATC Centre. With respect to the
framework for the analysis of controller recovery, the following conclusions are
relevant:
21. Recovery variables relevant to controller recovery from equipment failures in
ATC are the recovery context, effectiveness, and duration. This set of recovery
variables showed a potential for the rigorous analysis of controller recovery.
22. The experiment showed that the controllers with previous experience of
equipment failures executed more required recovery steps. Overall, experience
with equipment failures enhances a controllers ability to deal with any type of
equipment failure.
23. A further finding from the experiment is that recovery duration tends to be
longer, the closer the emergency training with a module on equipment failures
is to the occurrence of the actual failure.
24. Communication with team members or the supervisor is enhanced when
controllers initiate recovery action sooner (i.e. as close as possible to the instant
of the occurrence of the failure).
305
Chapter 11
Conclusions
25. Furthermore, the experiment showed that the existence of recovery procedures
(or any type of reference material, such as training manuals) promotes prompt
recovery action.
26. The experiment also showed that recovery duration increases with a decrease
in traffic complexity.
27. The recovery context indicator represents a good indicator of both recovery
effectiveness and the outcome of the recovery process (represented as a
combination of the recovery effectiveness and duration).
28. The thesis has identified a statistically significant correlation between recovery
context indicator and the outcome of the recovery process. Hence, the outcome
of the recovery process represents a good safety indicator of the overall
recovery process.
The relevance of recovery training (either as an alternative or an addition to past
experience) and recovery procedures has been confirmed by experiment. Recovery
training and awareness of recovery procedures lead to more prompt recovery action,
better awareness of required recovery steps, and enhanced team communication.
These findings should directly inform the required policy on training and procedures for
handling unusual/emergency situations, highlighting required content, frequency, and
format. Furthermore, the recovery variables identified (recovery context, effectiveness,
and duration) have the potential to facilitate a rigorous analysis of controller recovery
from equipment failures in ATC and thus can be used in incident investigation
processes. Finally, the recovery context indicator represents a good indicator of the
outcome of the recovery process (represented as a combination of the recovery
effectiveness and duration). As such, the overall framework for the analysis of
controller recovery based on identified recovery variables can be used to assess the
outcome of the recovery process in both current and future ATM environment.
11.3 Future work

The research presented in this thesis demonstrates the capability to assess ATC
equipment failures and subsequent controller recovery performance. However, these
findings also suggest a number of directions for further research. These include:
It is hard to find safety related research in the aviation industry which does not rely
upon some type of occurrence data. However, seldom do any of them pose a
question about the reliability of the data available. To this date, no measure of
reliability of occurrence databases has been produced. Automatic tools exist in
certain countries, for example the Safety Monitoring Function (SMF), which
306
Chapter 11
Conclusions
captures all losses of separation incidents in controlled airspace of that country.

Data from such a tool may provide an indication of the reliability of the occurrence
data.
Future research should investigate ways to overcome the logistical difficulties with
capturing operational data and corresponding qualitative and quantitative aspects of
validation (e.g. in terms of questionnaire survey sample, number and characteristics
of ATM specialists, and subject matter experts).
The further development of the qualitative equipment failure impact assessment
tool (Chapter 4) would be required to enable assessment of the impact of several
independent failures on ATC operations and thus controller performance. The
output of this more advanced approach would be to indicate the most severe
independent multiple failures. However, to achieve this, the tool would have to be
adapted to a specific ATC Centre to integrate the complexity of its ATC architecture
and the flow of data between various ATC systems.
The questionnaire survey used in any future research should apply rigorous design
methods to avoid ambiguities and facilitate interpretation or perception of key terms
(e.g. equipment failure).
The relationship between the particular RIF level and its impact on controller
recovery (i.e. defined via qualitative descriptor in Chapter 7 and the correlation
coefficient in Chapter 8) could be defined as a function of RIF level. This approach
would be more sensitive to the changes resulting from the incorporation of RIF
interactions.
It would be necessary to simulate the impact of ATC equipment failures in a future
gate-to-gate ATM system where the roles for planning and executive control will be
reorganised and distributed between controllers and pilots. Additionally, this future
environment will be characterised with dynamic real-time exchange and distribution
of flight-related information. Thus, the safety assessments would have to consider
the exchange and distribution of corrupted data and its impact on both air and
ground services.
The thesis has identified a statistically significant correlation between recovery
context indicator and the outcome of the recovery process. Future research should
transfer this finding into a model that could be used operationally in an ATC Centre.
11.4 Publications relating to this work

The following publications have been produced in support of the research on controller
recovery from equipment failures in ATC. The publications consist of journal
307
Chapter 11
Conclusions
publications and published conference proceedings, each commented on the precise

contribution of listed co-authors.
11.4.1 Publication format: journal accepted subject to revision

Subotic, B., Majumdar, A., and Ochieng, W.Y. (2007). Recovery from Equipment
Failures in Air Traffic Control (ATC): The findings from an international survey of
controllers. Accepted subject to revision to the International Journal of Engineering and
Operations: Air Traffic Control Quarterly. Air Traffic Control Association Institute, Inc.
11.4.2 Publication format: journal - published

Subotic, B., Ochieng, W.Y., and Straeter, O. (2007). Recovery from equipment failures
in ATC: An overview of contextual factors. The Reliability Engineering and System
Safety Journal, Vol 92 (7), pp. 858-870.
Subotic, B., Ochieng, W.Y., and Majumdar, A. (2005). Equipment Failures in Air Traffic
Control: Finding an Appropriate Safety Target. The Aeronautical Journal of the Royal
Aeronautical Society, Vol 109 (1096), pp.277-284.
11.4.3 Publication format: conference proceedings - published

Subotic, B., Ochieng, W. and Straeter, O. (2006). Recovery from Equipment Failures in
Air Traffic Control: A Probabilistic Assessment of Context. Proceedings of the
Probabilistic Safety Assessment (PSAM 08) conference, May 14-19, 2006, New
Orleans, USA.
Subotic, B., and Ochieng, W.Y. (2005). Recovery from Equipment Failures in Air Traffic
Control. In Contemporary Ergonomics 2005 (Eds. P.D. Bust and P. T. McCabe). Taylor
& Francis. Presented at the Ergonomics Society Annual Conference, De Havilland
Campus, University of Hertfordshire, Hatfield.
308
Chapter 12
List of References
12 List of References
10News (2006). Power Outage Momentarily Interrupts Air Traffic Control. From
http://www.10news.com/news/8831526/detail.html
Air Transport Action Group (2005). The economic & social benefits of air transport.
From http://www.atag.org/files/Soceconomic-124721A.pdf
Air Transport Association (2006). Cost of ATC Delays. From
http://www.airlines.org/economics/specialtopics/ATC+Delay+Cost.htm
Airbus (2004). Global Market Forecast 2004-2023. From
http://www.airbus.com/en/myairbus/global_market_forcast.html
Airways New Zealand (2006a). Manual of Air Traffic Services (amendment 113).
Airways New Zealand.
Airways New Zealand (2006b). Domestic and International Aircraft Movements by
Calendar Year. From http://www.airways.co.nz/documents/avimove_stats.pdf
Aviation International News (2001). Europeans embracing MLS with a vengeance.
From http://www.ainonline.com/issues/04_01/Apr_2001_europeanmlspg75.html
Bainbridge, L. (1983). Ironies of Automation. Automatica, 19, 775-779. From
http://www.bainbrdg.demon.co.uk/Papers/Ironies.html
Bainbridge, L. (1984). Diagnostic Skill in Process Operation. Department of
Psychology, University College London. From
http://www.bainbrdg.demon.co.uk/Papers/DiagnosticSkill.html
Baker, S., and Weston, I. (2001). Mayday, mayday, mayday. From
http://www.isasi.org/working_groups/ats/atsmayday.pdf
Berenson, M.L., Levine, D.M., Krehbiel, T.C. (2006). Basic Business Statistics:
Concepts and Applications. Prentice Hall: Upper Saddle River, NJ.
Billings, C.E. (1996). Aviation Automation: The Search for a Human-Centred Approach.
Hillsdale, N.J.: Lawrence Erlbaum Associates.
Boehm-Davis, D., Curry, R.E., Wiener, E.L., and Harrison, R.L. (1983). Human factors
of flight-deck automation: Report on a NASA industry workshop. Ergonomics, 26,
953-961.
Boeing (2004). Statistical Summary of Commercial Jet Airplane Accidents: Worldwide
Operations 1959 2003. From
http://www.boeing.com/news/techissues/pdf/statsum.pdf.
Bove, T. (2002). Development and Validation of a Human Error Management
Taxonomy in Air Traffic Control. PhD dissertation. Ris National Laboratory,
Roskilde. From http://www.risoe.dk/rispubl/SYS/syspdf/ris-r-1378.pdf
309
Chapter 12
List of References
British Airways (2006). Flight Training Safety and Emergency Procedures (SEP)
Training. From http://www.britishairwaysjobs.com/baweb1/?newms=info150
Brooker, P. (2004). Consistent and up-to-date aviation safety targets. Draft version.
Cranfield University.
Brooker, P. (2006). Air Traffic Control Safety Indicators: What is Achievable?
Eurocontrol: Safety R&D Seminar, 25-27 October 2006, Spain. From
https://dspace.lib.cranfield.ac.uk/bitstream/1826/1372/1/Eurocontrol+2006+ATCBrooker.pdf
Bureau of Transport and Regional Economics (2006). Aviation. Australian Government.
From http://www.btre.gov.au/statistics/aviation.aspx
Bureau of Transportation Statistics (2004). Airline On-Time Statistics and Delay
Causes. From http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
Bureau of Transportation Statistics (2006). Dictionary. From
http://www.bts.gov/dictionary/list.xml?letter=A
CASA (2006). ADS-B: Automatic Dependent Surveillance Broadcast. Civil Aviation
Safety Authority Australia. From http://casa.gov.au/pilots/download/ADS-B.pdf
Christensen, W.C., and Manuele, F.A. (1999). Safety through Design: Best Practices.
National Safety Council Press.
Cox, K. (2005). Teamwork and Trust: A Pilots Perspective. From
http://safecopter.arc.nasa.gov/Pages/Columns/SBrief/SafeBrf1Articles/6Teamwor
k.html
Damidau, A., Kirwan, B., and Scrivani, P. (2006). Safety Getting Real: Safety Insights
from Real Time Simulations. Proceedings from the EUROCONTROL Safety R&D
Seminar, Barcelona 25-27 October 2006, Spain.
Daniels, J.J., Regli, S.H., and Franke,J.L. (2002). Support for Intelligent Interruption
and Augmented Context Recovery. Proceedings from 7th IEEE Human Factors
Meeting. Scottsdale, Arizona.
Dekker, S., Fields, B., and Wright, P. (2004). Human Error Recontextualised. From
http://www.cs.mdx.ac.uk/staffpages/bobf/papers/glasgow.pdf
Department of Defense (2001). Global Positioning System: Standard Positioning
Service Performance Standard. Command, Control, Communication, and
Intelligence. Washington DC.
Endsley, M. (1997). Situation Awareness, Automation & Free Flight. From http://atmseminar-97.eurocontrol.fr/endsley.htm
Endsley, M. R., and Kaber, D. B. (1999). Level of automation effects on performance,
situation awareness and workload in a dynamic control task. Ergonomics, 42(3),
pp. 462-492.
Endsley, M., and Kiris, E. (1995). The out-of-the-loop performance problem and level of
control in automation. Human Factors, 37(2), pp. 381-394.
EUROCONTROL (1997). EUROCONTROL Standard Document for Radar Surveillance
in En-Route Airspace and Major Terminal Areas. From
http://www.eurocontrol.int/surveillance/gallery/content/public/documents/SURVST
D.pdf
EUROCONTROL (1999). CD-ROM: An introduction to ATM. EUROCONTROL Institute
of Air Navigation Services.
310
Chapter 12
List of References
EUROCONTROL (2000a). Safety Minima Study: Review Of Existing Standards And

Practices. From
http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/srcdo
c1ri.pdf
EUROCONTROL (2000b). Conflict Resolution Assistant Level 2 (CORA2): Controller
Assessments (ASA.01.CORA.2.DEL02-b.RS).
EUROCONTROL (2000c). ESARR 2: Reporting and Assessment of Safety
Occurrences in ATM. From
http://www.atceuc.org/site/Eurocontrol/pdf02/esarr2%20v2.0%20en.pdf
EUROCONTROL (2001a). ECAC Safety Minima for ATM. EUROCONTROL Safety
Regulation Commission.
EUROCONTROL (2001b). ESARR 4: Risk Assessment and Mitigation in ATM.
EUROCONTROL Safety Regulation Commission.
http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/esarr
4v1.pdf
EUROCONTROL (2001c). Safety assessment of the free route airspace concept:
Feasibility phase. Working Draft 0.3. European Organisation for the Safety of Air
Navigation, EUROCONTROL. From
http://www.eurocontrol.int/airspace/gallery/content/public/documents/frap/safety_
assessment_report_integrated
EUROCONTROL (2001d). European Manual of Personnel Licensing - Air Traffic
Controllers: Guidance on Implementation. From
http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERAB
LES/L2%20(HUM.ET1.ST08.10000-GUI-01)%20Released-withsig.pdf
EUROCONTROL (2001e). Harmonisation of European Incident Definitions Initiative for
ATM HEIDI Viewer Instructions for Use. Safety, Quality and Standardisation
Unit (SQS).
EUROCONTROL (2001f). EUROCONTROL Airspace Strategy for the ECAC States.
From http://www.eurocontrol.int/eatm/gallery/content/public/library/airspace.pdf
EUROCONTROL (2002b). Technical Review of Human Performance Models and
Taxonomies of Human Error in ATM (HERA). From
LES/HF26 (HRS-HSP-002-REP-01) Released.pdf
EUROCONTROL (2002c). Glossary of Terms and Definitions & List of Acronyms (SRC
DOC 4). From
c4e2.pdf
EUROCONTROL (2002d). Short Report on Human Performance Models and
Taxonomies of Human Error in ATM (HERA). From
http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABL
ES/HF27%20(HRS-HSP-002-REP-02)%20Released.pdf
EUROCONTROL (2003a). MADAP in a Nutshell. Maastricht Upper Area Control
Centre, Netherlands.
EUROCONTROL (2003b). Summer: ATFM summary report. From
http://www.cfmu.eurocontrol.int/ATFM/public/docs/publicreport_2003year.pdf
311
Chapter 12
List of References
EUROCONTROL (2003c). EUROCONTROL ATM Strategy for the Years 2000+,

Volume 1. From
http://www.eurocontrol.int/eatm/gallery/content/public/library/ATM2000-EN-V12003.pdf
EUROCONTROL (2003d). HERA-JANUS training: Analysing Human Error in Incident
Investigation. 18-20 November 2003. EUROCONTROL Institute of Air Navigation
Service, Luxembourg.
EUROCONTROL (2003e). The Human Error in ATM Technique (HERA-JANUS). From
LES/HF30 (HRS-HSP-002-REP-03) Released-withsig.pdf
EUROCONTROL (2003f). Guidelines for Controller Training in the Handling of
Unusual/Emergency Situations. From
LES/T11%20(Edition%202.0)%20HRS-TSP-004-GUI-05withsig.pdf
EUROCONTROL (2003g). Radio and Navigation Aids Course (IANS_ATC_RADNAV).
EUROCONTROL Institute of Air Navigation Service, Luxembourg.
EUROCONTROL (2003h). Area Navigation Applications in Europe. From
http://elearning.eurocontrol.int/ATMTraining/precourse/nav/rnav/index.html
EUROCONTROL (2003i). ESARR 6: Software in ATM Systems. Safety Regulatory
Commission. From
http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/esarr
6_e10_ri.pdf
EUROCONTROL (2004a). Evaluating the True Cost to Airlines of One Minute of
Airborne or Ground Delay. Prepared by the University of Westminster for
Performance Review Unit. From
www.eurocontrol.int/prc/gallery/content/public/Docs/cost_of_delay.pdf
EUROCONTROL (2004b). MANTAS Basic Operational Concept, Version: Draft 0.2.
EUROCONTROL.
EUROCONTROL (2004c). CORA 2 Safety Analysis: Exploratory Preliminary System
Safety Assessment (PSSA). European Air Traffic Management Programme.
EUROCONTROL (2004d). Review of Techniques to Support the EATMP Safety
Assessment Methodology. From
http://www.eurocontrol.int/eec/gallery/content/public/documents/EEC_notes/2004
/EEC_note_2004_01_1.pdf
EUROCONTROL (2004e). Managing System Disturbances in ATM: Background and
Contextual Framework. From
LES/HF47%20(HRS-HSP-005-REP-06)%20Released-withsig.pdf
EUROCONTROL (2004f). The Impact of Automation on Future Controller Skill
Requirements and a Framework for SHAPE (HRS/HSP-005-REP-04). Human
Factors Management Business Division (DAS/HUM).
EUROCONTROL (2004g). Model Based Simulation of the Turkish En-Route Airspace
(EEC Report No. 396). From http://www.ans.dhmi.gov.tr/TR/ATCTR/proje/fts.pdf
EUROCONTROL (2005). ATM Contribution to Aircraft Accidents/Incidents: Review and
Analysis of Historical Data. From
c2_e40_ri_web.pdf
312
Chapter 12
List of References
EUROCONTROL (2006a). Air Traffic Control (ATC). From

http://www.eurocontrol.int/corporate/public/standard_page/cb_airtraffic_controller.
html
EUROCONTROL (2006b). What is PRNAV? From
http://www.ecacnav.com/content.asp?PageID=82
EUROCONTROL (2006c). Performance Review Report covering the calendar year
2005. Performance Review Commission.
EUROCONTROL (2006d). The impact of fragmentation in European ATM/CNS.
Performance Review Commission. From
http://www.eurocontrol.int/prc/gallery/content/public/Docs/fragmentation.pdf
EUROCONTROL (2007a). Safety Nets. From http://www.eurocontrol.int/safetynets/public/subsite_homepage/homepage.html
EUROCONTROL (2007b). Single European Sky. From
http://www.eurocontrol.int/ses/public/subsite_homepage/homepage.html
European Commission (2001). Meeting societys needs and winning global leadership.
Report of the group of personalities. From
http://ec.europa.eu/research/growth/aeronautics2020/pdf/aeronautics2020_en.pd
f
European Commission (2006a). GNSS Autonomous Navigation Algorithms Critical
Study (D3.2.2.1). Draft report. Sixth Framework Programme (2002-2006).
European Commission (2006b). Critical Analysis of Space-Based Navigation
Technologies Usable for Civil Aviation (D3.1P). Draft report. Sixth Framework
Programme (2002-2006).
European Space Agency (2002). Space Product Assurance: Safety (ESA Q-40-B).
Requirements & Standards Division. Noordwijk, The Netherlands.
Federal Aviation Administration (1995). Approach Station Keeping (Ask) Experiment
Plan and Final Report (DOT/FAA/CT-TN95/58). Department of Transportation:
Federal Aviation Administration. From
http://www.tc.faa.gov/acb300/techreports/TN9558.pdf
Federal Aviation Administration (1997). Hardware Product Specification Document for
the Voice Switching and Control System (VSCS) (DTFA0192D00004).
Department of Transportation: Federal Aviation Administration.
Federal Aviation Administration (1998). Voice Switching and Control System:
Attachment J-3 - Product Specification (FAA-E-2731G). Department of
Transportation: Federal Aviation Administration.
Federal Aviation Administration (2000). System Safety Handbook, Chapter 3.
Department of Transportation: Federal Aviation Administration. From
http://www.asy.faa.gov/RISK/SSHandbook/contents.htm.
Federal Aviation Administration (2003). The Human Factors Design Standard (HFSTD-001). Compact disk, William J. Hughes Technical Center, Atlantic City
International Airport, NJ.
Federal Aviation Administration (2005). Air Transportation Operations Inspector's
Handbook (Order 8400), Vol 1. Department of Transportation: Federal Aviation
Administration. From
http://www.faa.gov/library/manuals/examiners_inspectors/8400/
313
Chapter 12
List of References
Feng, S., Ochieng, W., Walsh, D., and Ioannides, R. (2005).A Measurement Domain
Receiver Autonomous Integrity Monitoring Algorithm. GPS Solutions. Springer
Berlin/Heidelberg.
Frese, M. (1991). Error Management or Error Prevention: Two Strategies to Deal with
Errors in Software Design. In H. J. Bullinger (Ed.) Human aspects in Computing:
Design and Use of Interactive Systems and Work with Terminals. Amsterdam:
Elsevier Science Publishers.
Frese, M., Brodbeck, F.C., Zapf, D., & Prumper, J. (1990). The Effects of Task
Structure and Social Support on Users Errors and Error Handling. In D. Diaper et
al. (Eds.) Human Computer Interaction - INTERACT90 (pp.35-41). Amsterdam,
Elsevier Science Publishers.
Fujita, Y., and Hollnagel, E. (2004). Failures without errors: quantification of context in
HRA. Reliability Engineering and System Safety, 83, pp. 145-151.
Funk, K., Lyall, B., and Riley, V. (1996). Perceived Human Factors Problems of
Flightdeck Automation: Phase 1 Final Report. Federal Aviation Administration
Grant 93-G-039. From
http://www.flightdeckautomation.com/phase1/phase1report.aspx
General Accounting Office (1982). Computer Outages at Terminal Facilities and Their
Correlation to Near mid-air Collisions (AFMD-82-43). US GAO, Washington DC.
General Accounting Office (1991). Air Traffic Control: FAA Can Better Forecast and
Prevent Equipment Failures. US GAO, Washington DC.
General Accounting Office (1996). Air Traffic Control: Good Progress on Interim
Replacement for Outage-Plagued System, but Risks Can Be Further Reduced.
US GAO, Washington DC.
General Accounting Office (1998). Air Traffic Control: Information Concerning
Equipment Outages at Two Kansas City Area Facilities. US GAO, Washington
DC.
Gordon, R., and Makings, N. (2003). Gate 2 Gate: Stakeholder Safety Survey.
EUROCONTROL Experimental Centre, France.
Graham, G.M., Kinnersly, S and Joyce, A. (2002). Safety Reporting and Aviation
Target Levels of Safety. In C.W. Johnson, Investigation and Reporting of
Incidents and Accidents (IRIA 2002). Department of Computing Science,
University of Glasgow, Scotland.
Hai, L. (2004). Civil Aviation Safety Outline (2001-2020). From
http://www.seaskyad.com/ad@cca_english/content/content_0206_special_article
s/article16.htm.
Hallbert B.P. and P. Meyer (1995). Summary of lessons learned at the OECD Halden
reactor project for the evaluation of human-machine systems. Institutt for
Energiteknikk, Halden, Norway.
Heinrich, H.W. (1941). Industrial Accident Prevention A Scientific Approach. Mc Graw
Hill: New York and Wiley: London.
Hilburn, B. (2004). Cognitive Complexity in Air Traffic Control - A Literature Review.
EUROCONTROL Experimental Centre, EEC Note 04/04.
Hilburn, B., and Flynn, M. (2001). Air Traffic Controller and Management Attitudes
Toward Automation: An Empirical Investigation. 4th USA/EUROPE Air Traffic
Management R&D Seminar, Santa Fe, USA.
314
Chapter 12
List of References
Hollnagel, E. (1993). Human Reliability Analysis: Context and Control. Academic

Press, London.
Hollnagel, E. (1998). Cognitive Reliability and Error Analysis Method (CREAM).
Elsevier Science Ltd., London, UK.
IEEE (1998). IEEE Guide for Microwave Communications System Development:
Design, Procurement, Construction, Maintenance, and Operation. IEEE-SA
Standards Board. From
http://ieeexplore.ieee.org/iel4/5643/15123/00690973.pdf?arnumber=690973
IFALPA (2005). Interpilot: 60th Annual Conference: Boeing 787 programme update.
From
http://216.239.59.104/search?q=cache:oJuuByAkeqEJ:www.ifalpa.org/Interpilot/2
005/06inp01.pdf+Interpilot:+60th+Annual+Conference:+Boeing+787+programme
+update&hl=en&ct=clnk&cd=1&gl=uk
IFATCA (2004). Produce Definition of Controller Tools (Agenda Item B.5.2).
Proceedings from 43rd Annual Conference, Hong Kong, 22-26 March 2004.
IFATCA (2005). A Positive Step to Improve Aviation Safety. From
http://www.ifatca.org/press/141105.pdf
International Civil Aviation Organization (1979). Annex 5: Units of Measurement to be
Used in Air and Ground Operations. Montreal, Canada.
International Civil Aviation Organization (1985). Manual of Air Traffic Forecasting (Doc
8991-AT/722/2). Montreal, Canada.
International Civil Aviation Organization (1994). All-Weather Operations Panel.
Fifteenth meeting. Montreal, Canada.
International Civil Aviation Organization (1995). Review of the General Concept of
Separation panel (RGCSP). Working Group A: A Review of Work on Deriving a
Target Level of Safety (TLS) for En-route Collision Risk. Montreal, Canada.
International Civil Aviation Organization (1997). Outlook for Air Transport to the Year
2005 (ICAO Circular 270-AT/111). Montreal, Canada.
International Civil Aviation Organization (1998). Human Factors Training Manual Doc
9683 (First Edition). Montreal, Canada.
International Civil Aviation Organization (2001a). Air Traffic Management Doc 4444.
Montreal, Canada.
International Civil Aviation Organization (2001b). Annex 6: Operation of Aircraft.
Montreal, Canada.
International Civil Aviation Organization (2001c). Annex 11: Air Traffic Services.
Montreal, Canada.
International Civil Aviation Organization (2001d). Annex 13: Aircraft Accident and
Incident Investigation. Montreal, Canada.
International Civil Aviation Organization (2001e). Annex 1: Personnel Licensing.
Montreal, Canada.
International Civil Aviation Organization (2003). Review the latest developments in the
ATN Panel and the Aeronautical Mobile Communication Panel. From
http://www.icao.int/icao/en/ro/apac/atn_2003/ip02.pdf
International Civil Aviation Organization (2005). Report of the Ninth Meeting of
Communications, Navigation And Surveillance/Meteorology Sub-Group
315
Chapter 12
List of References
(Cns/Met/Sg/9) Bangkok, Thailand 11 15 July 2005. From

http://www.icao.int/icao/en/ro/apac/2005/CNS_MET_SG9/CNSMET_SG9.pdf
International Civil Aviation Organization (2006a). Review Developments Relating to
CNS/ATM Implementation: Review the Work by RNP Special Operational
Requirements Study Group on the Implementation of RNP Operations. From
http://www.icao.int/icao/en/ro/apac/2006/ATM_AIS_SAR_SG16/wp22.pdf
International Civil Aviation Organization (2006b). Contracting States. From
http://www.icao.int/cgi/goto_m.pl?/cgi/statesDB4.pl?en
International Civil Aviation Organization (2007). CNS/ATM Systems. From
http://www.icao.int/icao/en/ro/rio/execsum.pdf
Jeppesen (2001). Required Navigation Performance (RNP). Jeppesen Briefing Bulletin.
From http://www.jeppesen.com/download/briefbull/den01-j.pdf
Johnson, C. W. and Holloway, C.M. (2004). On the Over-Emphasis of Human Error
As A Cause of Aviation Accidents: Systemic Failures and Human Error in US
NTSB and Canadian TSB Aviation Reports 1996-2003. From
http://www.dcs.gla.ac.uk/~johnson/papers/Cause_comparisons/Error_and_accide
nts.PDF
Joint Aviation Administration (1994). Joint Aviation Requirements for Large Aeroplanes
(JAR25).
Kaarstad M., Ludvigsen J.T. (2002). Background study for further research in
performance recovery. Presented at Enlarged Halden Programme Group
Meeting, Storefjell,C2/5/116.
Kaber D.B. (1997). The Effect of Level of Automation and Adaptive Automation on
Performance in Dynamic Control Environments (ANRCP-NG-ITWD-97-01).
Amarillo, TX: Amarillo National Resource Center for Plutonium.
Kaber, D. B. and Riley, J. (1999). Adaptive automation of a dynamic control task based
on secondary task workload measurement. International Journal of Cognitive
Ergonomics, 3(3), 169-187.
Kaber, D.B., Prinzel, L.J., Wright, M.C., and Clamann, M.P. (2002). Workload-Matched
Adaptive Automation Support of Air Traffic Controller Information Processing
Stages (NASA/TP-2002-211932). National Aeronautics and Space
Administration. From
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20020080640_2002133430.
pdf
Kanse, L. (2004). Recovery uncovered: How people in the chemical process industry
recover from failures. PhD dissertation. Eindhoven University of Technology.
Kanse, L. and van der Schaaf, T. (2000). Recovery from failures - understanding the
positive role of human operators during incidents. In by D. de Waard, C. Weikert,
J. Hoonhout and J. Ramaekers (Eds.), Human System Interaction: Education,
Research and Application in the 21st Century. Maastricht, Netherlands: Shaker
Publishing.
Kennedy, R., Kirwan, B., and Summersgill, R. (2000). Making HRA a more consistent
science. In Foresight & Precaution, Eds. Cottam, M., Pape, R.P., Harvey, D.W.,
and Tait,J. Balkema, Rotterdam.
Kim, M.C., Seong, P.H., and Hollnagel, E. (2005). A probabilistic approach for
determining the control mode in CREAM. Reliability Engineering and System
Safety, pp. 1-9.
316
Chapter 12
List of References
Kirwan, B. (1994). A Guide to Practical Human Reliability Assessment. Taylor &

Francis, London, UK.
Kirwan, B. (1997). The development of a nuclear chemical plant human reliability
management approach: HRMS and JHEDI. Reliability Engineering and System
Safety, Vol 56, pp. 107-133.
Kirwan, B., Gibson, H., Edmunds, J., Cooksley, G., Kennedy, R., and Umbers, I.
(1994). Nuclear Action Reliability Assessment (NARA): A Data-Based HRA Tool.
Kirwan, B., Basra, G., and Taylor-Adam, S.E. (1997). CORE-DATA: A Computerised
Human Error Database for Human Reliability Support. Proceedings from the
Sixth Annual Human Factors Meeting, Orlando, US.
Kontogiannis, T. (1999). User strategies in recovering from system failures in manmachine systems. Safety Science 32(1), pp. 49-68.
Kopardekar, P., and Magryratis, S. (2003). The measurement and prediction of
dynamic density. Presented at the FAA-EUROCONTROL ATM 2003 Seminar,
Budapest.
Lanzi, P., and Marti, P. (2001). Innovate or preserve: when technology questions
cooperative processes. From
http://www.dblue.it/pdf/ECCE11_Lanzi_Marti_v3.pdf
Layton, C., Smith, P. J., and McCoy, E. (1994). Design of a cooperative problemsolving system for en-route flight planning: An empirical evaluation. Human
Factors, 36, pp. 94-119.
Leveson N.G. (1995). Safeware: System Safety and Computers. Addison- Wesley
publishing company, New York.
Littlewood, B., Strigini, L., Wright, D., and Courtois, P.J. (1998). Examination of
Bayesian Belief Network for Safety Assessment of Nuclear Computer-Based
Systems ESPRIT DeVa Project 20072). From
http://www.csr.city.ac.uk/people/lorenzo.strigini/ls.papers/DeVa_BBN_reports/De
VaTR70_year3.5a/DeVaTR70.pdf
Low, I. and Donohoe, L. (2001). Engineering Psychology and Cognitive Ergonomics
Volume 5: Aerospace and Transportation Systems. Edited by Don Harris.
Methods for assessing ATC controllers recovery from automation failures.
National Air Traffic Service (NATS), UK.
Majumdar, A., and Ochieng, W.Y. (2002). Estimation of European Airspace Capacity
from a Model of Controller Workload. Journal of Navigation, Vol 55(3), pp. 381403.
Majumdar, A., Ochieng, W.Y., McAuley, G., Lenzi, J.M., and Lepadatu, C. (2004). The
Factors Affecting Airspace Capacity in Europe: A Cross-Sectional Time-Series
Analysis Using Simulated Controller Workload. Journal of Navigation, Vol 57(3),
pp.385-405.
Massaiu, S., Haugset, H., and Bjorlo, T.J. (2003). Human Reliability Issues in Traffic
Control Centres. Norwegian Research Council.
Mauri, G. (2000). Integrating Safety Analysis Techniques, Supporting Identification of
Common Cause Failures. PhD thesis, The University of York.
Metzger, U., and Parasuraman, R. (2005). Automation in future air traffic management:
Effects of decision aid reliability on controller performance and mental workload.
Human Factors, 47(1), 35-49.
317
Chapter 12
List of References
Ministry of Land, Infrastructure, and Transport (2006). Statistics. Air Traffic Activity at
Cab Facilities: Area Control Center. From
http://www.mlit.go.jp/koku/04_hoan/e/statistics/image/00_00.gif
Mohleji, S., C., Lacher, A. R., and Ostwald, P.A. (2003). CNS/ATM System Architecture
Concepts and Future Vision of NAS Operations. In 2020 Timeframe. Center for
Advanced Aviation System Development (CAASD), The MITRE Corporation.
From
http://www.mitre.org/work/tech_papers/tech_papers_03/mohleji_2020/mohleji_20
20.pdf
National Aeronautics and Space Administration (2000). Required Communication
Performance (RCP). From http://as.nasa.gov/aatt/wspdfs/Oishi.pdf
National Aeronautics and Space Administration (2002). NASA Safety Manual
w/Changes through Change 1 (NPR 8715.3). NASA QS / Safety & Risk
Management Division.
National Air Traffic Services (1999). Testing Operational Scenarios for Concepts in
ATM (Phase II). WP2: Airspace Sectorisation Optimisation. European
Commission.
National Air Traffic Services (2002). Manual of Air Traffic Services Part II. London Area
Control Centre, edition 2/02.
National Air Traffic Services (2004). NATS apologises for delays experienced today.
From http://www.nats.co.uk/news/news_stories/2004_06_03_2.html
National Transportation Library (1997). Potential Cost Savings Ideas for FAA and
Users. From http://ntl.bts.gov/lib/000/500/511/costsav.pdf
National Transportation Safety Board (1973). Aircraft Accident Report (AAR-73-14).
From http://amelia.db.erau.edu/reports/ntsb/aar/AAR73-14.pdf
National Transportation Safety Board (1983). Aircraft Accident Report (AAR-83-02).
From http://amelia.db.erau.edu/reports/ntsb/aar/AAR83-02.pdf
National Transportation Safety Board (1996).Special Investigation Report: Air Traffic
Control Equipment Outages. Washington, D.C.
Nolan, M. S. (1998). Fundamentals of Air Traffic Control. Belmont, USA: Wadsworth
Publishing Company.
Nuclear Regulatory Commission (1998). Technical Basis and Implementation
Guidelines for a Technique for Human Event Analysis (ATHEANA). NUREG1624. U.S. Nuclear Regulatory Commission, Washington, DC.
Ochieng, W.Y. (2006). Future Air Traffic Management. Course presentation for Air
Traffic Management Module (T23). Imperial College London.
Orasanu, J., and Fischer, P. (1997). Finding decisions in natural environments: the
view from the cockpit. In Zsambok, C.E. & Klein, G. Mahwah (Eds) Naturalistic
decision-making. New Jersey: Lawrence Erlbaum Associates Publishers.
Oren, T., and Ghasem-Aghaee, N. (2003). Personality Representation Processable in
Fuzzy Logic for Human Behavior Simulation. Summer Computer Simulation
Conference, July 20-24, 2003. Montreal, Canada. From
http://www.site.uottawa.ca/~oren/pres/pres-of-2003-01-SCSC-personality.pdf
Parasuraman, R., and Riley, V. (1997). Humans and automation: use, misuse, disuse,
abuse. Human Factors Vol 39, 230-253.
318
Chapter 12
List of References
Parasuraman, R., Bahri, T., Deaton, J., Morrison, J., and Barnes, M. (1990). Theory
and Design of Adaptive Automation in Aviation Systems. Technical Report No.
CSL-N90-1, Cognitive Science Laboratory. Catholic University of America,
Washington, DC.
Parasuraman, R., Mouloua, M., and Molloy, R. (1996). Effects of adaptive task
allocation on monitoring of automated systems. Human Factors. 38. pp. 665-679.
Parasuraman, R., Wickens, C. D., and Sheridan, T. (2000). A model for types and
levels of human interaction with automation. IEEE Transactions on Systems,
Man, and Cybernetics, 30(3), 286-297.
Park, J., Jung, W., Ha, J., and Shin, Y. (2004). Analysis of operators performance
under emergencies using a training simulator of the nuclear power plant.
Reliability Engineering and System Safety, 83, pp. 179-186.
Perrow, C. (1999). Normal Accidents. Princeton University Press.
Piantek, T.W. (1999). Influence in contracting and purchasing. In Safety Through
Design: Best Practices (EDS. Christensen, W.C., Manuele, F.A.). National Safety
Council Press.
PPrune Forums (2006). ATC Issues. From
http://www.pprune.org/forums/forumdisplay.php?s=ac64e2a0afd13472a93e7df2b
ba4b826&f=18
Rail Safety and Standards Board (2004). Rail-Specific HRA Tool for Driving Tasks
Phase 1 Report. From http://www.rssb.co.uk/pdf/reports/research/T270 Railspecific HRA tool for driving tasks Phase 1 report.pdf
Rasmussen, J. (1982). Human errors: A taxonomy for describing human malfunction in
industrial installations. Journal of Occupational Accidents, 4, 311-335.
Reason, J.T. (1997). Managing the risks of organizational accidents. Aldershot,
England: Ashgate Publishing.
Reid, J.W. (1996). Safety by Design. Lecture 4: Cost and acceptability of risk.
Hazardous forum: London.
Rigas, G. and Elg, F. (1997). Mental models, confidence, and performance in a
complex dynamic decision making environment. Department of Psychology,
Uppsala University, Sweden. From
http://www.ie.boun.edu.tr/labs/sesdyn/isdc97/TURKIA.doc
RISKS (2000). U.K. ATC System Failure. The RISKS Digest, Vol 20, issue 94. From
http://catless.ncl.ac.uk/Risks/20.94.html
Rizzo, A., Ferante, D., and Bagnara, S. (1995). Handling human error. In J.M. Hoc,
P.C. Cacciabue, & E. Hollnagel (Eds.), Expertise and Technology: Cognition &
Human-Computer Cooperation (pp. 195-212). Hillsdale, NJ: Lawrence Erlbaum.
Saldana, M. A. M., Herrero, S. G., del Campo, M. A. M. and Ritzel, D. O. (2002).
Assessing Definitions and Concepts within the Safety Profession. From
http://www.aahperd.org/iejhe/2003_first/ritzel.pdf.
Sampaio, J. J. M., and Guerra, A. A. (2004). The day god failed or overtrust in
automation: The Portuguese case study. In Proceedings from the 2nd
Conference on Human Performance Situation Awareness and Automation
(HPSAA 2). Daytona Beach, FL.
319
Chapter 12
List of References
Scerbo, M.W. (2005).Adaptive Automation. Department of Psychology Old Dominion

University. From
http://www.cs.colorado.edu/~mozer/courses/6622/papers/aachpt05-12-15.htm
Sellen, A. J. (1994). Detection of everyday errors. Applied psychology: An International
Review 43(4), pp. 475-498.
Shappell, S.A. (2000). The Human Factors Analysis and Classification System-HFACS
(DOT/FAA/AM-00/7). Federal Aviation Administration. US Department of
Transportation. From
http://www.nifc.gov/safety_study/accident_invest/humanfactors_class&anly.pdf
Sheridan, T.B. (1980). Computer control and human alienation. Technology Review Vol
10, pp.61-73.
Shier, R. (2004). The Mann-Whitney U Test. Matematics Learning Support Centre.
From http://mlsc.lboro.ac.uk/documents/Mannwhitney.pdf
Shorrock, S. (1992). Error Classification for Safety Management: Finding the Right
approach. In C.W. Johnson (Ed.), Investigation and Reporting of Incidents and
Accidents IRIA 2002 (pp. 57-67). From
http://www.dcs.gla.ac.uk/~johnson/iria2002/IRIA_2002.pdf
Shorrock, S. T., and Kirwan, B. (2002). Development and application of a human error
identification tool for air traffic control. Applied Ergonomics, Vol 33, pp. 319336.
Smith, S.P., Harrison, M.D. and Schupp, B.A. (2004). How explicit are the barriers to
failure in safety arguments? Computer Safety, Reliability, and Security
(SAFECOMP'04). In M. Heisel, P. Liggesmeyer and S. Wittmann (Eds), Lecture
Notes in Computer Science Vo 3219, pp. 325-337, Springer.
Sorensen, J.N. (2002). Safety culture: a survey of the state-of-the-art. Reliability
Engineering and System Safety, Vol 76, pp. 189-204.
Straeter, O. (2000). Evaluation of human reliability on the basis of operational
experience. Dissertation at Munich Technical University.
Straeter, O. (2001). The quantification process for human interventions. In: Kafka, P.
(ed.) PSA RID Probabilistic Safety Assessment in Risk Informed Decision
making. EURO-Course. 4.- 9.3.2001. GRS. Germany.
Straeter, O. (2005). Cognition and Safety: An Integrated Approach to Systems Design
and Performance Assessment. Ashgate: Aldershot.
Subotic, B., Ochieng, W.Y., and Majumdar, A. (2005). Equipment Failures in Air Traffic
Control: Finding an appropriate safety target. The Aeronautical Journal of the
Royal Aeronautical Society, Vol 109(1096), p. 277-284.
Subotic, B., Ochieng, W.Y., and Straeter, O. (2006a). Recovery from equipment
failures in ATC: An overview of contextual factors. Reliability Engineering and
System Safety Journal Vol 92 (7), pp. 858-870.
Subotic, B., Ochieng, W. and Straeter, O. (2006b). Recovery from Equipment Failures
in Air Traffic Control: A Probabilistic Assessment of Context. Probabilistic Safety
Assessment (PSAM 08) Conference, May 14-19, 2006, New Orleans, US.
Swain, A. D., and Guttman, H. E. (1983). Handbook of human reliability analysis with
emphasis on nuclear power plant applications (NUREG/CR-1278). Washington
D.C.
Theis, I. and Strter, O. (2001). By-Wire Systems in Automotive Industry. Reliability
Analysis of the Driver-Vehicle-Interface Proceedings. ESREL 2001, Turin.
320
Chapter 12
List of References
THEMES (2001). Thematic Network for Safety Assessment of Waterborne Transport.

Deliverable No. D5.1. Report on Safety and Environmental Assessment Method.
From http://projects.dnv.com/themes/Deliverables/D5.1Final.pdf
Theureau J., Jeffroy F. and Vermersch P. (2000). Controlling a nuclear reactor in
accidental situations with symptom-based computerized procedures: a
semiological & phenomenological analysis. Proceedings from CSEPC 2000.
Taejon, Core, 22-25 Novembre.
UK Civil Aviation Authority (2000). Aviation safety review 1990-1999 (CAP 701). Civil
Aviation Authority, London.
UK Civil Aviation Authority (2003). United Kingdom Manual of Personnel Licensing - Air
Traffic Controllers (CAP 744). Civil Aviation Authority. London.
UK Civil Aviation Authority (2004). Fact Sheet - SSR Mode S, Edition 1.2. From
http://www.caa.co.uk/docs/810/DAP_SSM_Mode_S_SSR_Factsheet.pdf
UK Civil Aviation Authority (2005). Mandatory Occurrence Reporting Scheme. CAP
382. Civil Aviation Authority, London. From
http://www.caa.co.uk/docs/33/CAP382.PDF
UK Civil Aviation Authority (2006). Manual of Air Traffic Services - Part 1 (CAP 493).
Civil Aviation Authority, London. From
http://www.caa.co.uk/docs/33/CAP493Part1.pdf
United Nations (2006). UN in Brief. From
http://www.un.org/Overview/brief1.html#footnote
van der Schaaf, T. W. (1992). Near miss reporting in the chemical process industry.
PhD thesis. Eindhoven University of Technology.
van der Schaaf, T.W. (1995). Human recovery of errors in man-machine systems.
Proceedings of the Sixth IFAC/IFIP/IFORS/IEA Symposium on the Analysis,
Design and Evaluation of ManMachine Systems. Cambridge, MA.
van Es, G.W.H. (2003). Review of Air Traffic Management-related accidents worldwide:
1980-2001. National Aerospace Laboratory (NLR).
Ward, M., Grupen, L., Regehr, G. (2002). Measuring Self-assessment: Current State of
the Art. Advances in Health Sciences Education, 7, pp. 6380.
Weisberg, H.F., Krosnick, J.A., and Bowen, B.D. (1996). An Introduction to Survey
Research, Polling, and Data Analysis. SAGE Publications: London.
Wickens, C.D. (1992). Engineering psychology and human performance, 2nd Ed. New
York: Harper Collins.
Wickens, C.D. (2001). Attention to Safety and the Psychology of Surprise. From
http://www.aviation.uiuc.edu/UnitsHFD/conference/Osukeynote01.pdf
Wickens, C.D., Lee, J.D., Liu, Y., and Gordon Becker, S.E. (2004). An Introduction to
Human Factors Engineering. New Jersey: Pearson Prentice Hall.
Wickens C.D, Mavor, A. and McGee, J.P. (Eds.) (1997). Flight to the Future: Human
Factors in Air Traffic Control. Washington, DC: National Academy Press.
Wickens, C.D., Mavor, A. S., Parasuraman, R., and McGee, J.P. (1998). The Future of
Air Traffic Control: Human Operators and Automation. National Academy Press:
Washington, DC.
Wiener, E.L. and Curry, R.E. (1980). Flight deck automation: promises and problems.
Ergonomics, Vol 23, pp. 995-1011.
321
Chapter 12
List of References
Williams, J.C. (1986). HEART A Proposed Method for Assessing and Reducing
Human Error. In 9th Advances in Reliability Technology Symposium. University of
Bradford, 1986.
Wood, A. (1996). Software Reliability Growth Models. From
http://www.hpl.hp.com/techreports/tandem/TR-96.1.pdf
Zapf, D., and Reason, J.T. (1994). Introduction: Human Error and Error Handling.
Applied psychology: An international review, Vol 43(4), pp. 4127-432.
322
Appendices
Appendices
Appendix I
The cost of delays induced by ATC equipment failures
Appendix II
Interviews with ATM staff
Appendix III
Checklist for the Equipment Failure Scenarios in a specific European

Appendix IV
The questionnaire design
Appendix V
Example of one questionnaire response
Appendix VI
Results extracted from the question 5 of the questionnaire survey
Appendix VII Overview of contextual factors

Appendix VIII Probabilities for 20 Recovery Influencing Factors (RIFs)
Appendix IX
Questions for the ATM Specialist
Appendix X
Overview of RIFs, their corresponding levels, and designated

probabilities
Appendix XI
Validation of the RIFs interaction matrix
Appendix XII Distribution of 20 Recovery Influencing Factors (RIFs)

Appendix XIII Experimental material
Appendix XIV Overview of RIFs, their corresponding levels, and probabilities
determined in the experimental investigation
Appendix XV Distribution of the recovery context indicator captured in the experiment
323
Appendices
Appendix I The cost of delays induced by ATC equipment

failures
The impact of an equipment failure on ATM can be analysed from several different
perspectives. From a financial perspective, it is necessary to consider the costs
identified in ATC and the cost of delays in a wider region. A small exercise has been
conducted on the cost of delays induced by ATC equipment failures in the European
Civil Aviation Conference (ECAC) and US airspace.
From EUROCONTROLs Central Flow Management Unit (CFMU) data for the period
from 1999 to 2003 (Table 1), ATC equipment failure induced delays are split between
en route and airports respectively. Given that the cost of one minute delay in Europe in
the year 2002 is estimated to be EUR72 (EUROCONTROL, 2004a), the last column of
Table 1 presents total costs incurred by airlines as a result of airborne and ground
delays. It is important to highlight that the estimate for the cost of one minute delay
(EUR72) is based on primary delay costs, reactionary delay costs (e.g. knock-on
effect to the other aircraft), as well as fuel, maintenance, ground handling of aircraft
and passengers, passenger costs of delay to the airline, and future loss of market
share due to lack of punctuality (EUROCONTROL, 2004a). As a result, the calculated
annual cost of delays caused by ATC equipment failures accounts for all relevant costs
and thus demonstrates the high cost of technical failures.
Table 1 ATC equipment as a cause of airport and enroute delays (personal correspondence )
Year
Enroute Delay
(min)
Airport Delay
(min)
Total Delay
(min)
1999
2000
2001
2002
2003
609265
598660
614534
425627
149476
461290
265055
406760
138045
147528
1070555
863715
1021294
563672
297004
Annual cost for the airlines

(million EUR) based on the
year 2002
77.08
62.19
73.53
40.58
21.38
There are a number of reasons for the differences in the delay reported by the CFMU
(Table 1) for a given period. Some global factors explaining the delay reductions in the
decade beginning in 2000, are the general reduction of air traffic (as a result of post
September 11th 2001 crisis in the aviation industry), the presence of severe factors
(e.g. closure of Yugoslav airspace in 1999), the introduction of new route structures in
1999, the influence of European ATM network programs (e.g. Reduced Vertical
1
Personal correspondence with EUROCONTROL CMFU.
324
Appendices
Separation Minima-RVSM, improved capacity management), and staffing issues that

reached the highest record in 2002 (EUROCONTROL, 2003b).
Similar calculations have been carried out for the impact of ATC equipment failures on
the overall USs National Aviation System (NAS). The US NAS consists of aircraft,
pilots, facilities, controllers, airports, maintenance personnel, together with computers,
communications equipment, satellite navigation aids, and radars. Direct aircraft
operating cost per minute of delay is calculated according to the Air Transport
Association (ATA) estimates for the year 2005, which is $62.33 (Air Transport
Association, 2006). This cost comprises of fuel burn, extra crew time, maintenance,
aircraft ownership costs, and additional costs. These additional costs account for costs
of extra gates and manpower on the ground and costs imposed on airline customers
(passengers and cargo shippers) in the form of lost productivity, wages, and customer
satisfaction. The FAA estimates average cost of delay to air travelers to be $30.26 per
hour or $0.50 per minute (Air Transport Association, 2006). As a result, the average
costs of ATC equipment induced failures for the year 2004 and 2005 are given in Table
2.
Table 2 ATC equipment as a cause of the US National Aviation System delays. From Bureau of
Transportation Statistics (2004), summaries available only for the whole 2004 and 2005
Year
ATC equipment (min)
2004
2005
402644
274126
Average cost
(millions $)
25.10
17.09
In general, these high-level analyses illustrate that equipment failures can significantly
affect operational, safety, and financial aspects of both ATC and ATM systems. Both
methods (employed for Europe and the US) for calculating the cost of the delay per
minute are largely similar. The only difference is the financial value assigned to each
minute of delay in Europe and the US. In addition, the true cost of equipment failure
induced delay should also incorporate technical repair, unscheduled maintenance,
training, and additional staffing. However, it is assumed that these costs represent only
a fraction when compared to the cost of delay per minute. Therefore, it can be
concluded that these estimates are a reasonable representation of the total cost
induced by ATC equipment failure both in the European and the US aviation markets.
325
Appendices
Appendix II Interviews with ATM staff

Interviews with relevant Air Traffic Management (ATM) staff, as a method of data
collection, have been conducted to support the research presented in this thesis and to
augment available theoretical findings. They aimed to extract operational experience of
ATM specialists and experienced system control and monitoring engineers. The focus
of these interviews has been on four research areas. These are:
classification of ambiguous operational failure reports;
characteristics of air traffic controllers training;
characteristics of equipment failures in Air Traffic Control (ATC); and
contextual factors relevant to controller recovery from equipment failures in ATC.
Interviews with ATM specialists focused on the air traffic controller training (ab initio,
recurrent, and emergency training) and contextual factors relevant to controller
recovery. Interviews with system control and monitoring engineers revealed their
experiences related to the characteristics of ATC equipment failures.
The sample of ATM staff interviewed is as follows:
system control and monitoring engineers from four countries:
o
National Air Traffic Services (NATS), Corporate and Technical Centre (CTC)
and Swanwick Centre, UK;
EUROCONTROL
Maastricht
Upper
Area
Control
Centre
(MUAC),
Netherlands;
o
Irish Aviation Authority (IAA);
Airports Authority of India (AAI);
ATM specialists from two countries:

o
EUROCONTROL Institute of Air Navigation Services (IANS), Luxembourg;
Irish Aviation Authority (IAA).
Findings related to each research area are presented below.
326
Appendices
Table A-1 Findings related to the clarification of ambiguous operational data

Location
UK NATS (CTC)
EUROCONTROL
MUAC
Number of
participants
interviewed
one experienced
engineer
two experienced
engineers
Research
question
Finding
Agreement
between study
participants
Ambiguous
operational
failure reports
Proper
classification of
all operational
failure reports
Yes, clarified all

ambiguities
Table A-2 Findings related to the air traffic controllers training

Location
EUROCONTROL
IANS
IAA
Number of
participants
interviewed
one ATM
specialist
one ATM
specialist
Research
question
Findings
Agreement
between study
participants
Usefulness of
announcing the
training for
unusual/emergen
cy situations
Although
controllers may
anticipate an
unusual occurrence
within their
emergency training,
this does not
facilitate better
performance as
long as they do not
know the nature of
that unusual
occurrence
Yes, both
agreed
Table A-3 Findings related to the characteristics of equipment failures in ATC

Number of
Research
Location
participants
Finding
question
interviewed
one experienced
UK NATS (CTC)
Latent failures tend
engineer
to go unnoticed until
Existence of
EUROCONTROL one experienced
some other event or
latent failures
MUAC
engineer
failure reveals their
one experienced
existence.
IAA
engineer
one experienced
UK NATS (CTC)
engineer
Majority of ATC
EUROCONTROL two experienced
Complexity of
equipment failures
(MUAC)
engineers
failure type
affect single system.
one experienced
IAA
engineer
one experienced
UK NATS (CTC)
engineer
Majority of failures
Time course of
EUROCONTROL two experienced
tend to manifest
failure
(MUAC)
engineers
themselves
development
suddenly
one experienced
IAA
engineer
327
Agreement
between study
participants
Yes,
experienced
latent software
failures
Yes
Yes
Appendices
Table A-4 Findings related to the contextual factors relevant to controller recovery from
equipment failures in ATC
Number of
Research
Agreement between
Location
participants
Finding
question
study participants
interviewed
Contextual
factors relevant
Agreed on selected
Validation of the
two ATM
to controller
contextual factors and
IAA
candidate
specialists
recovery from
aided the definition of
contextual factors
equipment
each factor
failures in ATC
Their feedback was
similar. Identified
Validation of
inconsistencies were
interactions
further clarified during the
Interactions
between contextual
three ATM
interview and were the
IAA
between
factors identified
specialists
result of the
contextual factors
using operational
misperception of some
experience and the
factors. All
past research
inconsistencies were
clarified.
328
Appendices
Appendix III Checklist for the Equipment Failure Scenarios in

This section provides a framework for the design of the Aide-Memoire or checklist type
procedures for recovery from equipment failures in a particular ATC Centre. The
proposed framework is adapted to an ATC Centre that participated in the experimental
investigation segment of the research presented in this thesis. This Aide-Memoire
provides a potential framework, which needs be further discussed and developed in
accordance with the in-house expertise of the system control and monitoring staff and
ATM specialists of a respective ATC Centre. However, the concept and the design
solution presented here is transferable across ATC Centres.
Contents
Once all equipment failures to be included in the Aide-memoire have been defined,
they could be categorised into four distinct groups based upon their impact on ATC
operations (as discussed in Chapter 4). These four categories are as follows:
Major impact to operations room (all sectors/all workstations) severe flow

restrictions possible. Relevant failures are:
ONL LAN failure
Failure of the Surveillance Network
Failure of COMPAD
Loss of Flight Server
Loss of Track Server
Loss of SSR and PSR
Loss of FDPS
Loss of MRP
Potential colour
coding in AideMemoire RED
Moderate impact to operations room - impact to one or several workstation in

different suite, possible need to combine/move positions immediately and
possible flow restrictions. Relevant failures are:
o
Reduced radar data mode
Reduced alert mode
Reduced communication mode
Loss of ARTAS
Loss of VCS panel
Loss of a single CWP
Loss of entire sector suite
Loss of SRP
Potential colour
coding in AideMemoire YELLOW
329
Appendices
o
Loss of adjacent sector
Minimal impact not immediately critical but may have greater operational
impact over time. Relevant failures are:
o
Radar Data Function failure
Loss of single frequency
Overload of SRP
Overload of MRP
Loss of external feeds to AIS
Loss of STCA
Loss of APW
Loss of MSAW
Loss of OLDI
Loss of paper strip printer
Potential colour
coding in AideMemoire GREEN
Note that the categorisation above lists some but not all possible failures. Those
marked in italics are designed in the Aide-Memoire format and are presented below.
Further input from system control and monitoring staff and ATM specialists may yield
more accurate and precise types of failures and recovery steps to be taken.
Design
At the top of each procedure, it would be useful to have the appearance of the pictorial
Human Machine interface (HMI) warning, if applicable (e.g. the highlighted labels on
the General Information Window). This would be followed by the presentation of the
two types of information. Firstly, the required recovery steps, i.e. those that a controller
must perform to recover effectively and ensure safe air traffic control service. Secondly,
the key effects of the equipment failure on the ATC system (i.e. the ATC system
feedback). The rational for this design solution is that the top part of the checklist
should be reserved for the items that controllers should be aware of first, i.e. recovery
steps.
In addition, it is necessary to define procedures for different personnel working in the
operational environment, namely controllers (i.e. different roles for executive, planner,
and assistant controller), supervisors, and managers to assure a seamless recovery
process. If, for example, radar services fail on all workstations, personnel should have
a readily available guide to help them recover from the failure. These guidelines may
vary according to the type of user, because different roles may require different
information on equipment failures and recovery procedures.
330
Appendices
Note that the colour-coded categorisation could be used in a slightly different manner
as well. If this Aide-Memoire becomes a part of the generic procedures for handling
emergency/unusual situations than the use of colour should be restricted to categories
such as Aircraft Emergencies, Equipment Failures, Fire and Building Evacuation.
The Aide-Memoire, as a hard, laminated copy flip chart, should be readily available on
each Controller Working Position (CWP). A more detailed version, providing local or
ATC Centre specific data, should be at the supervisors position. For simplicity and
efficiency, it is better to present each relevant failure on a single page highlighting the
two main areas: what recovery steps to perform and what feedback to expect from the
ATC system. This approach assures the most efficient usage of the tool.
The final version of the Aide-Memoire should not be considered as an exhaustive list
but more of a living document. In other words, it will be necessary to update this tool on
annual basis to reflect the local expertise and to compile all changes (i.e. changes in
the ATC system, both software and hardware).
331
Appendices
ONL LAN Failure
ATCO actions:
Inform Coordinator
Inform all traffic
Check spare ODS
Maintain timely & accurate strip marking
Restrict traffic
Utilise holding patterns
Use only verbal coordination channels
Reaffirm traffic identification using the code on the FPS
Identify any new tracks using the Confirm Squawk?
method
Seek SAS assistance and print screen if possible
Ground all sport/non-commercial traffic ASAP
Utilise strategic ATC techniques when possible
Conduct regular checks of aircraft identification
Monitor Mode C closely
Be aware of the absence of Safety Nets and Monitoring
Aids
Cross check that exit conditions are achieved
Expedite reduction in traffic load
332
Appendices
ONL LAN Failure (Contd)

Expect:
The radar data is distributed via the RFS LAN
The following functions are NOT AVAILABLE:
Safety Nets and Monitoring Aids (existing alarms

maintained)
Flight Plan function (no coupling, no RAM & CLAM)
Radar Data function replaced by Radar Fallback function
Flight plan commands (i.e. mod)
Flight plan lists frozen with data at time of failure
Reception Queues
Message transmission
Coordination messaging
Mail box management
Resectorisation
SSR code management
AIS (only data available at the time of failure)
All correlation will be lost
333
Appendices
Failure of the Surveillance Network

ATCO actions:
Inform Coordinator
Inform all traffic
Employ procedural control techniques (if necessary
utilise emergency vertical separation of 500 feet)
Deny departures
Instruct aircraft to maintain VMC, if in VMC
Reduce traffic load ASAP
Seek assistance
Relocate to contingency site if required
Expect
All ODS frozen or blanked throughout the Centre
334
Appendices
Failure of COMPAD
ATCO actions:
Inform Coordinator
Transmit on second sector COMPAD
Access RBS and inform traffic of failure
Reset COMPAD
Seek assistance and relocate to spare CWP
Inform traffic of restoration of normal service when
service is restored
Expect:
Complete or Partial failure
Inability to transmit on RTF
Inability to access alternate RTF
Inability to use intercoms
Inability to access telephone network
335
Appendices
Reduced Radar Data Mode

GIW will show MRTS
ATCO actions:
Inform Coordinator
Report failure
Operate as normal
Expect:
All functions are available
The switch to RFS (MRTS) from ARTAS is automatic
Any position in by-pass before ARTAS failure will remain
in by-pass
336
Appendices
Reduced Alert Mode

GIW will show SNMAP
ATCO actions:
Inform Coordinator
Be aware of restricted, danger and prohibited
airspace inc. TSAs
Check MSAs at regional airports

Double and cross check Oceanic Entry COPs and levels
Utilise strategic traffic plans
Ensure tactical ATCO action is accurate
Employ TRM best practice
Continuously scan Mode C
Seek SAS assistance if necessary
Expect:
Any alert displayed prior to the reduced alert mode will
remain displayed regardless of whether or not the alert is
still valid.
Safety Net Function (STCA)

ATC Tools (MSAW and APW)
Monitoring Aids (RAM and CLAM)
Coupling
No APR sent to Flight Data function (no profile updates)
337
Appendices
Reduced Flight Plan Mode

GIW will show FDP
ATCO actions:
Inform Coordinator
Check availability of FDP function on spare ODS
Inform traffic of failure
Use verbal coordination channels inter sector/ centre
Identify all new tracks using the Confirm Squawk
technique
Maintain identification by regular checks
Restrict traffic flow where necessary
Be aware of unreliable Safety Nets and Monitoring Aids
Seek SAS assistance where necessary
Expect:
Flight Plan tracks

Tracks already displayed will remain displayed
Flight Plan commands (i.e. mod, terminate)
Message queues
Message transmission
Coordination messages
Mailbox management
Resectorisation
Limited Safety Net and Monitoring Aids due no update
of the flight plans
338
Appendices
Reduced Communication Mode

GIW will show FDX
ATCO actions:
Inform Coordinator
Use only verbal inter-centre coordination channels
Inform all traffic on RTF
Seek FDA assistance for AFTN or AIS information
Seek SAS assistance where necessary
Expect:
Inter centre communications

AFTN
Coordination messages (except inter sector)
Flight plans are not updated by external messaging
AIS
339
Appendices
Radar Data Function failure
ATCO actions:
Inform Coordinator
Select radar by-pass services
Expect:
No radar data function (neither ARTAS nor MRTS nor RFS)
340
Appendix IV The questionnaire design
Air Traffic Controller Questionnaire
Dear Sir/Madam,
This questionnaire is created for the purpose of obtaining information on equipment
failures and recovery in Air Traffic Control (ATC) System(s) from various standpoints.
The information you provide will be used in a research project jointly supported by
EUROCONTROL Experimental Centre and Imperial College London.
We would greatly appreciate your completing of the attached questionnaire. It will only
take a few minutes of your time to answer the questions which will contribute to our
joined effort to introduce more real experience into ATC safety analysis. Data collection
intends to support recovery strategies of future ATM and analyse the current status on
this issue. The information that you provide will be used as additional data source for
the PhD dissertation developing in this area.
The questionnaire is created in Microsoft Word 2000. It is our intention to enable you
to fill it out electronically and directly send it directly to the following e-mail address
(branka.subotic@ic.ac.uk). However, if it is more convenient you can use the fax
number provided below.
Generally there are two formats of the questions, which require different way of
answering. For some questions you will have to choose the most appropriate answer
by highlighting it, marking it (e.g. yes/no answers), while for the others you will have to
type in your full answer.
Please, fill out your questionnaire and try to answer the questions as detailed as
possible. Your answers will be strictly confidential and de-identified, thus your personal
details will not appear in any document connected to this research.
Thank you in advance for your time and effort.
Sincerely,
Branka Subotic
Research PhD student
Imperial College London
Centre for Transport Studies
London SW7 2AZ
Phone +44 (0)2075946 022
Fax
+44 (0) 2075946 102
branka.subotic@ic.ac.uk
341
Appendices
Air Traffic Controller Questionnaire
1. Total number of years active as a controller ____________

2. Please list the types of facilities that you have worked in, beginning with the most recent.
ATC Facility Name
(beginning with the
most recent)
Location
Number of years
worked in particular
Unit
Country
Type (Civilian/
Military)
Position/Rating
ACC/RDR, ACC/PROC,
APP/RDR, APP/PROC, TWR
or
ARTCC, TRACON, ATCT (USA)
3.
Have you ever experienced ATC equipment failure during your work? Mark the corresponding letter.
(If No go to question 10)
4.
What is the average number of ATC equipment failures during one year that you experience? _________________________
342
Appendices
5. Please fill in any previous experience with equipment failures which seriously impacted your work:
Type of
equipment
failure
System
affected?
(See Note
below)
Frequency of
the failure per
year
(in your own
experience)?
Did you
detect it
and how?
If not,
who
detected
it?
Duration of
the failure
min, h, days
(If you can
recall)?
Was the context*

of the failure an
important factor?
If yes, has it
positive or
negative impact?
Recovery/
contingency
procedure
existed or
not?
Recovery/
contingenc
y training
existed or
not?
Who
initiated
the
recovery?
How was
the
recovery
initiated?
* Page: 343
Context is defined as any aspect of the operating context that influenced the failure or recovery aspect (e.g. workload, HMI, personal factors, team factors).
Note:
The typical CWP (controller working position) contains one or more of the following systems (systems will vary from one center and country to another):
Radar (SSR, PRS, Mode S, radar data processing (RDP), multi-radar processing (MRP), single radar processing (SRP))
Ancillary screens (meteorological information, strip bay, traffic flow information, etc.)
o Flight Plan Processing (FPP)
o Flight Progress Strips (FPS)
Pointing devices (mouse & trackball)
Secondary input devices (keyboard or touch input device (TID))
343
Any
additional
comment
Appendices
Communication panel
R/T, telephone, headset, intercom
Strip printer
Ground based Safety Nets (SNET): STCA, MSAW, APW, or any other SNET available
Other (e.g. power supply)
6. How much do you generally rely upon the written procedures in case of equipment failure and how much on situation-specific
problem solving (i.e. improvisation)? Fill in the corresponding number for Procedures, Problem solving, AND Other.
1 (very much)
3 (moderately)
5 (not at all)
Written procedures
Situation-specific problem
solving
Other (e.g. past experience)
7.
Is there any organized exchange of the past experience in solving the equipment failures with your fellow
colleagues?
8.
If yes, is it supported by your management as a good work practice?
9.
According to your experience, what are the three most unreliable ATC systems/subsystems? Please use the device listing
from the Note above to state those systems starting with the most unreliable one:
(Note: Reliability is defined in this questionnaire as the probability that a piece of equipment or component will perform its intended
function without failure over the given time period and under specific or assumed conditions)
344
Appendices
Following questions should be answered in relation to your current job, position, and level of experience (the first one cited in the
question 2).
Procedures
10.
Are recovery/contingency procedures available? Mark the corresponding letter.
11.
Which types of equipment failures (outages) are covered by procedures in your Center?
12.
Are recovery/contingency procedures up-to-date?
13.
Are recovery/contingency procedures comprehensive?
14.
Are recovery/contingency procedures complete?
15.
If not, which procedure(s) would you add?
16.
Are recovery/contingency procedures understandable?
17.
Are recovery/contingency procedures easily accessible?
18.
Are recovery/contingency procedures realistic/feasible?
19.
Are recovery/contingency procedures compatible with other procedures?
345
Appendices
20.
Describe the situation when you had a problem applying the recovery/contingency procedure and why?
Training
21. Is training provided in recovery from equipment failures?
22. Is there separate refreshment training every year?
24. Is it enough?
25. Does the training covers all important equipment failures?
27. Are training methods suitable (realistic, varied, etc)?
28. Is recovery/contingency training compatible with and linked to other training?
23. If provided, how many times per year?
26. If not, what should be added?
346
Appendices
Conclusion
29. Please write down any other comments or suggestions based on your past experience or professional opinion that you might
have on the issue of equipment failures, recovery/contingency procedures, or training.
Thank you for taking the time to answer these questions. Your time and participation are greatly appreciated.
--End--
347
Appendices
Appendix V Example of one questionnaire response
348
Appendices
349
Appendices
350
Appendices
351
Appendices
352
Appendices
353
Appendices
Appendix VI Results extracted from question 5 of the

questionnaire survey
The question 5 aimed to provide an opportunity to controllers to discuss their past

experience with equipment failures which seriously impacted on their work. In order to
provide a structured description of each example and extract all relevant information,
question 5 was presented in the form of a table. The rows dealt with different failure
types while the columns dealt with various failure characteristics. These failure
characteristics were as follows:
1. Type of equipment failure and system affected (assessed in section 6.7.3.3
of Chapter 6);
2. Frequency of failure per year;
3. Individual who detected the failure;
4. Duration of the equipment failure;
5. Importance of the recovery context;
6. Existence of recovery procedure for a particular failure (assessed in Table
6-3, Chapter 6);
7. Existence of training for recovery for a particular failure;
8. Individual who initiated the recovery and method applied; and
9. Concluding remarks.
1. Frequency of failure per year
The frequency of failure experienced by controllers was not possible to extract in 27.20
percent of cases. This was partially due to missing responses but mostly due to vague
and unclear responses (e.g. very often, rare). The available and pre-processed data
show that the frequency of failures per year is on average more than 14, ranging
between less than once per year to as many as 730 annually (or twice per day). The
great dispersion of data confirms different interpretation of equipment failures (as
discussed in section 6.7.3.1 of Chapter 6).
2. Individual who detected the failure
The failures were detected most frequently by controllers (in 79.4 percent of examples)
and with the assistance of the system-generated failure alert (in 7.1 percent of
examples). Other cases include failure detection by watch supervisors, engineers,
pilots, or controllers from other ATC Centres (in the case of a failure affecting national
or regional airspace, such as failure of satellite communication, flight data processing
354
Appendices
system, or radar). These findings are expected as NATS (2002) reports that most
failures do not affect the controllers as these are prevented or recovered by system
control and monitoring unit. Moreover, the results obtained from this questionnaire
survey emphasise that the prompt detection of any ATC system deficiency depends
mostly on the controller, as a direct result of the controllers situational awareness.
Furthermore, the results show that failure detection may be aided by system-generated
failure alerts. This is an example of the synergy that exists between technical and
controller recovery achieved through the technical built-in defences for transmitting
information on failure (discussed in Chapter 4, section 4.3.2). These technical systems
will demonstrate more potential in the future, highly integrated ATC environment.
3. Duration of the equipment failure
Similar to the frequency variable, it was not possible to extract the duration of failures in
27.20 percent of examples. This was expected due to the difficulties with recalling the
duration of past failures. Additional problems were encountered with vague qualitative
responses (e.g. several days, a couple of hours, a few minutes). The available and preprocessed data show that the average duration of the reported failures was close to
one day, ranging from five minutes to one month. The large dispersion indicates
different durations for different types of failures.
The same categorisation of duration variables is applied as previously with the
operational failure reports (see Chapter 4, section 4.4.6). More precisely, the
categorisation focused on failures up to 15 minutes, between 15 minutes and one hour,
between one hour and one day, and those lasting more than one day. It is interesting to
note that distribution of duration from operational failure reports and from past
experience captured in this survey show similarities (Figure 1). The difference is
observed in the third category (duration from one hour to one day). It seems that in the
operational environment, equipment failures of this duration tend to occur more
frequently compared to the experience of controllers worldwide.
355
Appendices
100
Frequency
80
60
42.55%
40
31.06%
19.15%
20
7.23%
0
[0.00-0.25]
[0.26-1.00]
[1.01-24.00]
(>24.01]
a)
3,000
2,500
Frequency
2,000
1,500
34.51%
31.6%
25.85%
1,000
500
8.04%
0
[0.00-0.25]
[0.26-1]
[1.01-24]
[>24.01]
b)
Figure 1 Distribution of the duration variable a) from the questionnaire survey; b) from the
Country D operational failure reports (see Chapter 4)
4. Importance of the recovery context

When asked about the context surrounding the occurrence of an equipment failure, the
controllers acknowledged its importance in the majority of examples (73 percent of
examples). Furthermore, these controllers rated its impact mostly as negative (63.9
percent of examples). The negative issues mentioned regarding the context of the
equipment failures were reduction of capacity, increased workload, increased stress,
increased communication with aircraft, increased coordination with adjacent sectors,
and in some cases additional workload due to deterioration in the weather. However,
356
Appendices
there were several instances in which controllers rated context as positive mostly
through efficient teamwork, availability of an efficient assistant, low traffic levels at the
time of occurrence (i.e. no significant increase in workload), and ability to work with
fallback systems. As a result, the importance of context identified in past research is
confirmed in this questionnaire survey. The following Chapters are dedicated to further
assessment of recovery context.
5. Existence of training for recovery for a particular failure
Question 5 allowed mapping between ATC functionalities and available recovery
training for the sampled equipment failures1. The analysis showed that in 48 percent of
examples provided, the controllers had some type of recovery training. This training
was mostly provided for the communication, navigation, surveillance, and data
processing functions. Lack of training is identified for power outages and loss of safety
nets.
6. Individual who initiated the recovery and method applied
The individuals that initiated and applied recovery processes came predominately from
the controller population when compared with watch managers and engineers. This is
understandable as section 2 pointed out that most equipment failures are detected by
controllers. Having detected a problem with equipment, the controllers have to inform
engineers, indirectly through the watch manager, which constitutes the initiation of the
recovery. In some simple cases (e.g. loss of microphone and loss of screen), the
controller tries to replace the failed equipment either by using the spare one or by
changing to another working position (if there are any spare ones). In more complex
situations, when a change of position is not possible, the controller has to continue
working with the remaining tools and equipment and potentially revert to procedural
control, assure vertical separation, use fallback systems, and/or transfer all flights to an
adjacent sector or flight information region. Engineers initiate the recovery process in
the case of failures of aeronautical data exchange with adjacent ATC Centres,
runway/taxiway lighting systems, and data processing system. However, the controller
still remains responsible for safe separation of all traffic in the affected airspace.
Question 26 although intended to capture the type of recovery training missing in each
sampled ATC Centre yielded mostly high-level comments on impossibility to train for every
potential equipment failure.
357
Appendices
7. Concluding remarks
In general, the controllers perceive equipment failures as stressful and distracting
events that pose a major safety problem due to increased workload and difficulties with
maintaining identification of aircraft (e.g. in case of radar failure and data processing
failure). In one particular instance a controller commented that an equipment failure led
to a near miss. Another example pointed out the problems with equipment failures
occurring during night shift, as technical staff are not always available during that
period.
358
Appendices
Appendix VII Overview of contextual factors
Factor
HERA
Eurocontrol
HERA [12]
Pilot-controller
comm.
Pilot actions
Traffic and
airspace
Weather
TRACEr
Shorock and
Kirwan [19]
RAFT
Eurocontrol
[20]
External PSF
Pilot-controller
comm.
Pilot-controller
comm.
Written and verbal

communication
Task load and

system
complexity
Complexity;
Requirements for
perception;
requirements for
motor speed
Documentation
and procedures
Procedures
Procedures and
documentation
Required
procedures; Workmethods; Plant
policy
Training and
experience
Training and
experience
Training and
experience
Workplace design
and HMI
Workplace design,
HMI, and equipment
factors
Environment
Prior training,
experience
Ambient
environment
Quality of
environment; T; Air
quality; Situational
factors
Detractors; Extreme
T; radiation;
Pressure;
Inadequate oxygen
supply; Vibration;
Restricted
movements
Perception; Motor
system; Memory;
Decision-making;
Short-term and longterm memory
Duration of stress;
Pain; Thirst; Fatigue;
Threats; Monotony;
Work performance;
Circadian rhythm
Personal factors
Personal factors
10
Team factors
Social and team

factors
Social and team

factors
Organisational
factors
Other
organisational
factors, Logistical
factors
Internal
PSF
Organisational
structure; Working
hours; Actions by
shift leader,
manager;
Remuneration
structure
12
Suddenness of
occurrence
13
14
359
COCOM
Hollnagel
[27]
CREAM
Hollnagel [11]
Task speed; Task

load
Inconsistent
labelling
Human machine
interaction
Personal factors
Organisational
factors
Stressors
Design features;
Factors in task and
work resources;
Warnings and
danger signs; Manmachine factors;
Interface
11
THERP
Swain and Guttman [24]
Plans
Availability of
procedures/ plans
Normal/familiar
process state
Adequacy of training
and experience
MMI and
support
Adequacy of MMI
and operational
support
Working conditions
State of
momentarily
abilities
personality and
intelligence;
motivation and
attitudes;
emotional state;
stress; gender
Attitudes
deriving from
family or
groups; group
dynamic
processes
Time of the day

(circadian rhythm)
Crew collaboration
quality
Adequate
organisation
Adequacy of
organisation
Few
simultaneous
goals
Number of
simultaneous goals
Available time
Available time
Appendices
Factor
HRMS
Kirwan [28]
Recovery from
Failures
Kanse and van der
Schaaf [21]
CORE-DATA
Eurocontrol
[13]
ATHEANA
U.S. NRC
[29]
CAHR
Straeter [16]
NARA
Kirwan et al.
[30]
HPDB
Park et al.
[32]
Communication
Task organisation &

Task complexity
Task complexity &

Task criticality & Task
novelty
Task preparation; Task

simplicity; Complexity of
the task; Precision;
Monotony of activity
Procedures
Clarity/Precision of
procedures; Design of
procedures; Content;
Completeness; Presence
Dependencies
of the different
tasks/steps/acti
ons
Procedures
Training/expertise/expe
rience/competence
Quality of information/
interface
10
Task organisation
11
Person related factors
Refresher training &

Training
Inexperience
Shortfalls in the
quality of
information
conveyed by
procedures; use of
more dangerous
procedures
Operator
inexperience;
Unfamiliarity
(situation occurs
infrequently)
Unfamiliar plant
conditions
Usability of control;
Usability of equipment;
Positioning; Equivocation
of equipment ;
arrangement of
equipment; display
range; accuracy of
display; Labelling;
Marking; Reliability;
Technical layout;
Construction;
Redundancy; Coupled
equipment
Low signal to noise

ratio; Overriding
information easily
accessible; no
means to reverse an
unintended action;
Poor system
feedback; Poor
system feedback on
activity progress
Technical/workplace/situati
onal factors
Environmental
factors and
ergonomics
External event
Poor environment
Person related factors
Human
performance
capabilities at low
point; Excessive
workload
Technical/workplace/situati
onal factors
Ergonomic design &

HMI ambiguous & HMI
feedback; Alarms;
Labels
Stress; Workload
Processing; Information;
Goal reduction
Social factors
Lack of
supervision/checks
Non-optimal use
of human
resources
Operator under
load/boredom; A
conflict between
intermediate and
long-term
objectives; Stress
and ill-health;
Information overload
Poor handovers and
team coordination
problems
Low workforce
moral or adverse
organisational
environment
Available
procedure &
description of
all steps and
tasks
Level of
experience
Person issues;
Demand of
perception,
cognition, etc.
Team issues
12
13
14
Time
Factors relevant for

prioritisation of recoveryrelated factors
Time pressure
Time constraints
Occurrence-related factors
360
Time pressure
Time pressure
The time
needed to
correctly
perform tasks,
steps, and
actions
Appendices
Appendix VIII Probabilities for 20 Recovery Influencing Factors

(RIFs)
The relevant Recovery Influencing Factors (RIFs) are discussed in the four main
groups: internal factors (i.e. related to the controller), equipment failure related factors,
external factors (i.e. factors related to working conditions), and airspace related factors.
The following paragraphs present the underlying considerations in developing the
probability values for each predefined RIF.
A.1 Internal factors

Internal factors represent a group of RIFs closely related to the air traffic controller.
These include quality of training, controller experience with equipment failures in
his/her professional career, experience with (or trust in) the ATC system, generic
assessment of personal factors (e.g. personality, fatigue, stress), and communication
for recovery as a result of detected equipment failure.
A.1.1 Training for recovery from ATC equipment failure

This factor describes the adequacy of training provided in recovery tasks based on the
existing recovery procedures and/or other ATC Centre specific equipment failures,
frequency of refresher training (e.g. once per year), and familiarity with ATC system
operational modes (ranging from full, through reduced/emergency, to failed operation).
The qualitative descriptor and the corresponding probabilities are determined from the
questionnaire survey responses based on percentages of ATC Centres that provide
training for recovery, those that provide this training but not consistently, and those that
do not provide any training for recovery (see Chapter 6, section 6.7.3.6 and Chapter 8,
section 8.3.1.2). The qualitative descriptor and the corresponding probabilities for this
RIF are presented in Table 1.
Table 1 Summary of the RIF Training for recovery from ATC equipment failure
RIF
Training for
recovery
from ATC
equipment
failure
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number of
responses
suitable
tolerable
counter
productive
The
questionnaire
survey
134
361
Percentage
of
responses
RIF
probability
52
0.52
17
0.17
31
0.31
Nature of
the
validation
Appendices
A.1.2 Previous experience with equipment failures

This factor describes the overall level of controller experience with equipment failures,
as well as the level of experience with a particular type of failure under assessment.
The qualitative descriptor is set at two levels (controllers can either have experience
with equipment failures or not), while the probabilities are determined from the
questionnaire survey, further validated by the responses from the ATM specialists
surveyed (Table 2).
Table 2 Summary of the RIF Previous experience with equipment failures
RIF
Qualitative
descriptor
Previous
experience
with
equipment
failures
experienced
any type of
equipment
failure
no
experience
with
equipment
failures
Data source
for
probabilistic
assessment
The
questionnaire
survey
Number of
responses
Percentage
of
responses
RIF
probability
95
0.95
Nature of
the
validation
ATM
specialists
surveyed
134
5
0.05
A.1.3 Experience with system performance (reliance or trust in the

system)
This dynamic factor describes the overall level of experience of the controller with the
ATC system including the tools and subsystems on the ATC console. The use of
automated tools depends upon the controllers trust in their reliability. The extreme
situations of undertrust or overtrust may lead to problems. The former may result in the
tool not being used and the latter, in the over reliance of the controller on the tool
available. The probabilities are determined from the findings of the study by Hilburn
and Flynn (2001) also reported in EUROCONTROL (2000b), which involved a total of
79 controllers from seven European ATC Centres. This study used both focus group
discussions and survey data collections to extract controllers attitudes to future
automation needs, system development issues, and operational requirements. The
results showed that 18 percent of controllers sampled mistrust technology. On the
other hand, the responses from the ATM specialists surveyed in this thesis reveal that
10 percent of controllers have excessive trust in the system. Taking mistrust and
excessive trust together, the qualitative descriptor for this RIF is set at two levels and
the corresponding probabilities are shown below (Table 3).
362
Appendices
Table 3 Summary of the RIF Experience with system performance
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Experience
with system
performance
(reliance or
trust in the
system)
objective
attitude
toward the
ATC
system
excessive
trust and
mistrust
Past
research
and ATM
specialists
Number of
responses
Percentage
of
responses
RIF
probability
72
0.72
79/8
Nature of
the
validation
28
0.28
A.1.4 Personal factors

These are controller-related factors, which can be determined in a post-failure analysis
or predicted in the case of predictive analysis. This factor includes, but it is not limited
to, the following: time of the day (i.e. relevance of circadian rhythm), time into the shift
(i.e. level of situational awareness as well as fatigue), and age. Although other factors
are important, for example, the level of confidence, complacency, self-esteem (i.e. trust
in own ability), personality, motivation, attitudes deriving from family or close social
groups, and ability to cope with stress, they require the application of various sets of
psychological tests. Current definition of the personal factors accounts for all the above
mentioned factors and sets the qualitative descriptor at three levels. The respective
probabilities are determined from the average of the responses from the ATM
specialists surveyed (Table 4).
Table 4 Summary of the RIF Personal factors
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number of
responses
suitable
Personal
factors
tolerable
counter
productive
ATM
specialists
Percentage
of
responses
RIF
probability
65
0.65
26
0.26
0.09
Nature of
the
validation
A.1.5 Communication for recovery within team/ATC Centre

This factor includes only the communication that takes place between controllers for
the purpose of recovery from equipment failure. Therefore, it assesses the quality of
communication as well as the decision-making process, quality of Team Resource
363
Appendices
Management (TRM)2, familiarity of team members or the level of synergy between

them, the level of mutual understanding and the knowledge of different working
strategies, team efficacy, intent recognition (i.e. overt communication), and other items.
In the case of a single-controller position this factor should be understood as a
communication with a supervisor or any other relevant personnel. The qualitative
descriptor is proposed at three levels while the corresponding probabilities are
determined from the average of the responses from the ATM specialists surveyed
(Table 5).
Table 5 Summary of the RIF Communication for recovery within team/ATC Centre
RIF
Communication
for recovery
within
team/ATC
Centre
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number
of
responses
efficient
tolerable
ATM
specialists
inefficient
Percentage
of
responses
RIF
probability
73
0.73
24
0.24
0.04
Nature of
the
validation
A.2 Equipment failure related factors

Equipment failure related factors represent a group of RIFs defining the characteristics
of failures relevant to the controller recovery process. These are complexity of failure
type, time course of failure development, number of workstations/sectors affected, time
necessary to recover, existence of recovery procedure, and duration of failure. Details
on failure characteristics can be found in Chapter 4.
A.2.1 Complexity of failure type

This factor identifies single versus multiple component failures (as discussed in
Chapter 4) and thus the qualitative descriptor is proposed at two levels. The
probabilities of each level are determined using the operational failure reports from
available Civil Aviation Authorities (Table 6). Due to the relatively low level of
confidence in the use of CAA occurrence databases (see Chapter 8, section 8.3.1.5),
these probabilities were validated by the responses from the ATM specialists surveyed
which did not show a significant difference. Additionally, these results are in line with
the experience of system control and monitoring engineers interviewed for this study
TRM represents an effective use of all available resources for ATC personnel to assure safe
and efficient operation, to reduce error, avoid stress, and increase efficiency.
364
Appendices
who stated that the majority of ATC equipment failures represent single as opposed to
multiple failure occurrence (for evidence see Appendix II).
Table 6 Summary of the RIF Complexity of failure type
RIF
Qualitative
descriptor
Data source
for probabilistic
assessment
Number
of
response
s
Percentag
e of
responses
RIF
probab
ility
92
0.92
0.08
a single
failure
Complexity of
failure type
multiple
failure
Operational
failure reports
22,808
reports
Nature of
the
validation
ATM
specialists
responses
and system
control and
monitoring
engineers
A.2.2 Time course of failure development

This factor defines the temporal characteristics of failure occurrence. These are
sudden, gradual, and latent/persistent failures. As a result, the qualitative descriptor is
set at three levels: sudden failure/gradual degradation of system/persistent or latent
failure. Based on the averaged responses from the ATM specialists surveyed the
corresponding probabilities are presented in Table 7. These probabilities were
validated by the interviews with system control and monitoring staff from several ATC
Centres which did not show a significant difference (for evidence see Appendix II).
Table 7 Summary of the RIF Time course of failure development
RIF
Time course
of failure
development
Qualitative
descriptor
sudden
gradual
latent
Data source
for
probabilistic
assessment
ATM
specialists
responses
Number of
responses
Percentage
of
responses
RIF
probability
55
0.55
39
0.39
0.07
Nature of
the
validation
System
control and
monitoring
engineers
A.2.3 Number of workstations/sectors affected

This factor describes the immediate impact of a particular type of failure in terms of the
number of positions/sectors affected. It is closely linked to the overall ATC Centre
architecture, since exposure to failure varies greatly with the level of interconnectivity of
different systems, the level of availability of separate channels (redundancy/variability),
and complexity of failure (single vs. multiple failure). The qualitative descriptor is
proposed at two levels, differentiating between a failure affecting a single and multiple
365
Appendices
Controller Working Positions (CWPs) and sectors. Due to the lack of operational data,
a conservative approach is taken and probabilities are equally assigned between two
levels. Note that this RIF has no Level 1, i.e. the most favourable level, simply because
the number of workstations/sectors affected cannot have any positive or favourable
effect on controller performance (Table 8).
Table 8 Summary of the RIF Number of workstations/sectors affected
Data source
for
probabilistic
assessment
RIF
Qualitative
descriptor
Number of
workstations/
sectors
affected
one CWP or
several CWPs in a
sector
several CWPs in
several sectors/all
CWPs in all sectors
Number
of
responses
Percentage
of
responses
RIF
probability
50
0.5
Nature of
the
validation
N/A
50
0.5
A.2.4 Time necessary to recover

This factor describes the time necessary for a controller to recover from the effect(s) of
equipment failure. This time should be measured from the moment of failure
occurrence until the establishment of a normal or stable system state (i.e. assurance of
safe but not necessarily efficient control of air traffic). The qualitative descriptor is set at
two levels, differentiating between availability and lack of time to recover, while the
corresponding probabilities are determined from the average of the responses from the
ATM specialists surveyed (Table 9).
Table 9 Summary of the RIF Time necessary to recover
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Time
necessary
to recover
less than
time
3
available
in excess
of time
available
ATM
specialists
Number of
responses
Percentage
of
responses
RIF
probability
94
0.94
Nature of
the
validation
0.06
Time available to controller to react before the development of less than adequate separation.
366
Appendices
A.2.5 Existence of recovery procedure

This factor takes into account the availability of a written procedure, rules, or guidelines
for a particular type of equipment failure, the level of its comprehensiveness and
completeness. In future this RIF may even include the existence of some sort of a
dynamically adaptable procedure. The qualitative descriptor is set at three levels to
capture the quality of the existing procedure (Table 10). Probabilities are calculated
based on the findings from the questionnaire survey responses which showed that 13.8
percent of ATC Centres do not have any recovery procedures. The distinction between
suitable and tolerable procedures was acquired taking into account that 45 percent of
existing procedures are not complete, and therefore only tolerable. It should be noted
that this approach is limited as it associates incomplete procedures with tolerable
procedures. A more accurate approach is achievable when the proposed methodology
is applied to a specific equipment failure and its context.
Table 10 Summary of the RIF Existence of recovery procedure
RIF
Existence of
recovery
procedure
Data source
for
probabilistic
assessment
Qualitative
descriptor
suitable
tolerable
inappropriate
Number
of
responses
The
questionnaire
survey
Percentage
of
responses
RIF
probability
47
0.47
39
0.39
14
0.14
134
Nature of
the
validation
A.2.6 Duration of failure

This particular factor represents the amount of time during which a failure persists.
Applied to a specific system, it can carry important information on recovery and the
impact of particular failure on ATC and overall aviation safety. A discussion of the
duration of failures informed by the results of the operational failure report analysis
informed the qualitative descriptor, proposed at two levels. The corresponding
probabilities are determined from the operational failure reports (Chapter 4), further
validated by the responses from the ATM specialists surveyed which did not show a
significant difference (Table 11).
If procedures are not available, Inappropriate would be used.
367
Appendices
Table 11 Summary of the RIF Duration of failure
RIF
short period of time

(up to 15minutes)
Duration of
failure
Data
source for
probabilistic
assessment
Operational
failure
reports
moderate to
substantial period of
time (failures longer
than 15 minutes)
Number
of
responses
Percentage
of
responses
RIF
probability
56
0.56
22,808
(reports)
44
0.44
Nature of
the
validation
ATM
specialists
surveyed
A.3 External factors

External factors or factors related to working conditions represent the group of RIFs
related to the working conditions surrounding a controller at the moment of failure.
These are adequacy of HMI, operational support, quality of alarms/alerts and the
moment when they are triggered in the system, and the overall adequacy of the
organisational characteristics in an ATC Centre from the safety and operational
perspectives.
A.3.1 Adequacy of HMI and operational support

This factor includes the HMI and all available control panels (e.g. mode of operation,
radars in use, frequencies in use and dynamic flight information), situational display, as
well as the operational support provided by specifically designed decision aids. It is
important to highlight that a controller receives the entire feedback on the ATM system
performance through the HMI. The qualitative descriptor is set at three levels to capture
the quality of the HMI, while the probabilities are determined from the average of the
responses from the ATM specialists surveyed (Table 12).
Table 12 Summary of the RIF Adequacy of HMI and operational support
RIF
Adequacy of
HMI and
operational
support
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number
of
responses
suitable
tolerable
counter
productive
ATM
specialists
Percentage
of
responses
RIF
probability
53
0.53
45
0.45
0.03
Nature of
the
validation
A.3.2 Ambiguity of information in the working environment

This dynamic factor describes the transparency of the system, the level of system
interaction and redundancy, and existence of symptoms that can be interpreted in more
368
Appendices
than one way. In general, it is observed that a lack of transparency of an ATC system
leads people to make hypotheses on the causes of failures based on incomplete
information or best guess (see Straeter, 2005). ATC subsystems are highly dependent
on each other. Information from one tool can be distributed to several different
subsystems at the same time. For example, information on aircraft position is sent
directly to the radar data processing system, air traffic flow management, ATC tools
(including the monitoring aid and the medium term conflict detection tool), safety nets
(e.g. the short term conflict alert tool), and flight data processing system. In other
words, ATC systems are closely coupled and dependant upon dynamic information
exchange. For this reason the architecture of any ATC Centre takes into account
existing interactions by building a net of redundancies. In addition, any symptoms that
can be interpreted in more than one way will be interpreted wrongly in some instances.
Based on the above discussion, the qualitative descriptor are set at two levels whilst
the corresponding probabilities are determined from the average of the responses from
the ATM specialists surveyed (Table 13).
Table 13 Summary of the RIF Ambiguity of information in the working environment
RIF
Qualitative
descriptor
Ambiguity of
information in
the working
environment
the match between

the external
working
environment and
the controller's
internal mental
model
the mismatch
between the
external working
environment and
the controller's
internal mental
model
Data
source for
probabilistic
assessment
Number
of
responses
ATM
specialists
Percentage
of
responses
RIF
probability
86
0.86
Nature of
the
validation
14
0.14
A.3.3 Adequacy of alarms/alerts

As explained in Chapter 4, the function of alarms/alerts is to alert operators (visually
and/or auditory) to potential non-nominal system states. The role of the human
operator is then to confirm the existence of a failure and take appropriate actions.
Because of the complexity of current ATC consoles, it is believed that the availability,
adequacy of alerts, and other relevant characteristics should be considered separately
from HMI. Therefore, this factor describes the availability and adequacy of
369
Appendices
alarms/alerts which permit detection, diagnosis, and/or correction of failures, the

reliability of given information, the number of alerts presented to the controller, and the
appropriate location and format of alert information (e.g. signal, colour coding,
warning/message). The qualitative descriptor is set at three levels, to account for
suitable tolerable and inadequate design solutions, while the probabilities are
determined from the average of the responses from the ATM specialists surveyed
(Table 14).
Table 14 Summary of the RIF Adequacy of alarms/alerts
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number
of
responses
suitable
Adequacy of
alarms/alerts
tolerable
counter
productive
ATM
specialists
Percentage
of
responses
RIF
probability
75
0.75
20
0.2
0.05
Nature of
the
validation
A.3.4 Adequacy of alarm/alert onset

This dynamic factor describes one important characteristic of the available
alerts/alarms, namely the cognitive convenience of alert onset. In other words, alert
onset has a high impact on the overall recovery performance depending on the
moment of its onset. In addition, a misleading sequence of alerts can lead the controller
towards wrong assumptions with a cognitive tunnelling based on the initial alert,
thereby disregarding a later, possibly more relevant alert (Straeter, 2005). Since the
adequacy of alert onset depends directly on the complexity of traffic in the dedicated
airspace (dynamically changing every second), this RIF is given two levels.
Furthermore, due to the lack of ATC operational data on this advanced and futuristic
concept, a conservative approach is taken and probabilities are equally assigned
between two levels (Table 15).
370
Appendices
Table 15 Summary of the RIF Adequacy of alarm/alert onset

Data
source for
probabilistic
assessment
RIF
Adequacy
of
alarm/alert
onset
information from the

external world enters
the processing loop at
the right time
information from the
external world enters
the processing loop at
the wrong time, i.e.
misleading alarm or
sequence of alarms
Number
of
responses
N/A
Percentage
of
responses
RIF
probability
50
0.50
Nature of
the
validation
N/A
50
0.50
A.3.5 Adequacy of organisation

This factor describes several organisational characteristics of the ATC Centre. These
include but are not limited to the quality of roles and responsibilities, the availability of
team members, the availability and adequacy of supervision, the availability of
additional support (e.g. assistant), the personnel selection process, shift patterns and
personnel planning, attitude to teamwork, safety culture, existence of stress
management programs, support for the organised exchange of past experience on
equipment failures, adequacy of communication with management and technicians
(e.g. briefings, exchange of knowledge, bulletins, safety panels). Three qualitative
descriptors can be distinguished with probabilities determined from the average of the
responses from the ATM specialists surveyed (Table 16).
Table 16 Summary of the RIF Adequacy of organisation
RIF
Qualitative
descriptor
Data source
for
probabilistic
assessment
Number
of
responses
efficient
Adequacy of
organisation
tolerable
ATM
specialists
inefficient
Percentage
of
responses
RIF
probability
67
0.67
31
0.31
0.03
Nature of
the
validation
A.4 Airspace related factors

Airspace related factors relate to the characteristics of the airspace affected by the
degraded system performance, traffic complexity at the moment of failure and during
the recovery process, and weather conditions. In addition, this group includes the
overall task complexity of the situation. For example, an equipment failure occurrence
coupled with sudden increase in amount of traffic, sudden deterioration of weather, or
the existence of priority aircraft highly increase the complexity of the overall situation.
371
Appendices
A.4.1 Traffic complexity during the recovery process

This dynamic factor includes but is not limited to the following: the level and
characteristics of the traffic load, the mix of aircraft flying on instrument flight rules (IFR)
and visual flight rules (VFR), military aircraft (because of different performance
characteristics and speed differentials), the existence of priority aircraft (e.g. low fuel,
government flights, and medical emergency). There have been various studies into
traffic complexity (Hilburn, 2004) and various attempts to provide a quantitative
indicator of traffic complexity; for example using dynamic density (Kopardekar and
Magyrtis, 2003), cross-sectional time-series analysis methods (Majumdar et al., 2004),
and the use of traffic complexity indicator (EUROCONTROL, 2006c). Any of these
approaches may be used to inform the probabilities for the qualitative descriptor of this
particular RIF. Taking into account only the impact that traffic complexity may have on
the controller performance, this qualitative descriptor is proposed at two levels. One
level accounts for average traffic complexity whilst the other accounts for high and low
traffic complexity, as both negatively impact controller performance. The probabilities
are determined from the average of the responses from the ATM specialists surveyed
(Table 17).
Table 17 Summary of the RIF Traffic complexity during the recovery process
Qualitative
descriptor
RIF
Traffic
complexity
during the
recovery
process
High and low traffic

complexity
Average traffic
complexity
Data
source for
probabilistic
assessment
Number
of
responses
ATM
specialists
Percentage
of
responses
RIF
probability
19
0.19
Nature of
the
validation
81
0.81
A.4.2 Airspace characteristics during the recovery process

This dynamic factor incorporates the characteristics and complexity of airspace (i.e. its
component sectors), based upon the sector design characteristics (for details see
NATS, 1999). These characteristics include the number of crossing points and their
position in relation to sector boundaries, number of flight levels, number of entry and
exit points, special use airspace (SUAs) including zones of military activity,
characteristics of upper vs. lower airspace, airways configuration, and the number of
neighbouring sectors. It is important to highlight the difference between enroute and
terminal airspace in relation to recovery from equipment failures. The terminal airspace
is characterised with traffic in constant level change (i.e. ascending or descending) and
372
Appendices
frequent changes in heading compared to enroute airspace and especially its higher
levels. Due to differences in controller tasks, en-route airspace in general provides
more time to recover compared to terminal airspace. In addition, interviews with ATM
specialists revealed that terminal airspaces have radar coverage provided from one
radar source compared to en-route airspace, which is usually based on multi-radar
tracking (i.e. integration of data from several radar sites). The qualitative descriptor is
set at three levels whilst the corresponding probabilities are determined from the
average of the responses from the ATM specialists surveyed (Table 18).
Table 18 Summary of the RIF Airspace characteristics during the recovery process
RIF
Airspace
characteristics
during the
recovery
process
Qualitative
descriptor
Data
source for
probabilistic
assessment
Number
of
responses
Adequate
Tolerable
ATM
specialists
Inappropriate
Percentage
of
responses
RIF
probability
64
0.64
33
0.33
0.03
Nature of
the
validation
A.4.3 Weather conditions during the recovery process

This dynamic factor takes into account any change in weather conditions during the
recovery process. The qualitative descriptor is proposed at two levels whilst the
corresponding probabilities are determined from the responses from the ATM
specialists surveyed (Table 19).
Table 19 Summary of the RIF Weather conditions during the recovery process
Qualitative
descriptor
RIF
Weather
conditions
during the
recovery
process
Improved
Data
source for
probabilistic
assessment
Number
of
responses
ATM
specialists
RIF
probability
89
0.89
11
issues
during
Nature of
the
validation
Deteriorated
A.4.4 Conflicting
complexity)
Percentage
of
responses
the
recovery
0.11
process
(task
This dynamic factor describes the level of overall task complexity at the moment of
equipment failure. In the case of multiple conflicting tasks, the operator has to prioritise
between them (Straeter, 2005). In the case of any type of conflict alert (i.e. two or more
aircraft having a conflicting intent), the controller has to provide full attention to the
373
Appendices
resolution of the conflict using the equipment which is still operational, but assuming
that some other subsystem might fail. In ATC overall safety is the first priority. Due to
the dynamic nature of ATC, this qualitative descriptor is proposed at two levels, the
average complexity of the situation and both high and low complexity of the situation
(as both have negative effect on controller performance: increased workload and
boredom or monotony, respectively). The corresponding probabilities are determined
from the responses from the ATM specialists surveyed (Table 20).
Table 20 Summary of the RIF Conflicting issues during the recovery process (overall task
complexity)
Data source
Number
Percentage
Nature of
Qualitative
for
RIF
RIF
of
of
the
descriptor
probabilistic
probability
responses responses
validation
assessment
Conflicting
issues during
the recovery
process
The average
complexity
Multiple tasks
and low
complexity
ATM
specialists
72
0.72
28
0.28
374
Appendices
Appendix IX Questions for ATM Specialist
Note: The set of questions presented below is investigating controller recovery from
equipment failures in ATC. All questions should be answered based upon your
operational experience and knowledge. Whilst some of them are very specific, and
therefore pose a challenge to answer, please try to respond to all the questions giving
the appropriate percentages.
How often has training (initial &

refreshment) in your ATC Centre
been:
Suitable for potential equipment failures

Tolerable for potential equipment failures
Counter productive for potential equipment failures
100%
What is the percentage of ATCOs that have never experienced equipment failure in
their career? Please think of novice ATCOs as well and try to make the best estimation.
According to your best

judgement, what percentage of
ATCOs have:
Over-trust the automation/systems they are using

Objective attitude toward ATC automation (ATCOs
do trust automation but are aware of possible
failures)
Under-trust the automation/systems they are using
100%
In the event of equipment failure,

how often have personal factors
(stress, fatigue, self esteem)
been:
Suitable to the equipment failure in question

Tolerable to the equipment failure in question
Counter productive to the equipment failure in
question
100%
How often has team-related

communication for recovery
been:
Efficient
Tolerable
Inefficient
100%
What is the percentage of

equipment failures affecting:
One system only

Multiple systems at the same time
100%
What is the percentage of:
Sudden equipment failures

Gradual equipment failures
Latent equipment failures in your ATC Centre
375
Appendices
100%
How often has the time
necessary to recover (time
before the development of any
inadequate separation) been:
Adequate
Inadequate
100%
How often (in your overall

experience) have existing
recovery procedures been:
Suitable to the equipment failure in question

Tolerable to the equipment failure in question
Counter productive to the equipment failure in
question
100%
What is the percentage of

equipment failures lasting:
Up to 15min
More than 15min
100%
When there is a failure, how

often has information presented
on your HMI (i.e. radar screen)
been:
Suitable to the recovery from equipment failure (e.g.

provides appropriate cues, visual/auditory alerts)
Tolerable to the recovery from equipment failure
Counter productive to the recovery from equipment
failure (e.g. provides wrong cues, mislead you)
100%
When there is a failure, how

often have existing alarms/alerts
on radar screen been:
Suitable to the recovery from equipment failure

Tolerable to the recovery from equipment failure
Counter productive to the recovery from equipment
failure
100%
According to your opinion, what is the percentage of match between the controller's
situational awareness and the dynamic airspace and traffic configuration (traffic mix,
speed differentials, FL utilized, airways configuration) during the recovery process?
What percentage of time the

organisational features in your
ATC centre are:
Efficient
Tolerable
Inefficient regarding the support for better recovery
from equipment failures.
100%
In the event of an equipment

failure, how often has the traffic
complexity been:
Too high
Tolerable
Too low
100%

failure, how often has airspace
design and configuration been:
Adequate
Tolerable
Inappropriate
100%

failure, how often have the
weather conditions been:
Improved
Deteriorated or worsen
Unchanged
100%
376
Appendices
In the event of equipment failure,

how often has the total
complexity of the recovery
situation been:
High
Average
Low
100%
377
Appendices
Appendix X Overview of RIFs, their corresponding levels, and

designated probabilities
(1)
ID
Internal factors
(2)
RIF name
Training for
recovery from ATC
equipment failure
Previous experience
with equipment
failures
Experience with the

system performance
(reliance)
Personal factors
Communication for
recovery within
team/ATC Centre
type
(3)
Descriptor
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
Experienced with a
particular type of
failure or
Experienced with
any other type of
ATC equipment
failure
No experience with
ATC equipment
failures
Objective attitude
toward the system
Positive experience
with the system
(excessive trust) or
Negative
experience with the
system (undertrust)
Suitable for the
recovery process
Tolerable for the
recovery process
Counter productive
for the recovery
process
Time course of
failure development
Number of
affected
(5)
Probability
(p)
Expected
effect of
controller
recovery
performance
(7)
(8)
Level
Designator
(R)
Probability
of overall
situation
occurring
(p*R)
0.52
Most
favourable
0.52
0.17
Non
significant
0.00
0.31
Least
favourable
-1
-0.31
0.95
Most
favourable
0.95
0.05
Non
significant
0.00
0.72
Non
significant
0.00
0.28
Least
favourable
-1
-0.28
0.65
0.00
-1
-0.09
0.73
0.65
0.26
0.09
0.00
-1
-0.04
0.00
-1
-0.08
0.55
Improve
0.55
0.07
Non
significant
0.00
0.39
Least
favourable
-1
-0.39
0.50
Non
significant
0.00
0.50
Least
favourable
-1
-0.50
Tolerable
0.24
Inefficient
0.04
Persistent or latent
failure
Gradual
degradation of
system
One
workstation/one
sector or All
workstations in one
sector
Several
workstations/couple
of sectors or All
Least
favourable
0.73
Single system
affected
Multiple systems
affected
Most
favourable
Non
significant
(6)
Most
favourable
Non
significant
Least
favourable
Non
significant
Least
favourable
Efficient
Sudden failure
7
(4)
0.92
0.08
378
Appendices
10
11
12
External or factors related to working conditions
13
14
15
Airspace
related
factors
16
17
Time necessary to
recover
Existence of
recovery procedure
Duration of failure
Adequacy of HMI
and operational
support
Ambiguity of
information in the
working
environment
Adequacy of
alarms/alerts
Adequacy of
alarm/alert onset
Adequacy of
organisation
Traffic complexity
workstations/all
sectors
Adequate - less
than available time
Inadequate - in
excess of available
time
Suitable to the
situation in
question
Tolerable to the
situation in
question
0.94
Most
favourable
0.94
0.06
Least
favourable
-1
-0.06
0.47
Most
favourable
0.47
0.39
Non
significant
0.00
-1
-0.14
0.00
Inappropriate
0.14
0.56
Moderate period of
time or Substantial
period of time
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
External working
environment
matches the
controller's internal
mental model
External working
environment
mismatches the
mental model
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
Information from
the external world
enters the
processing loop at
the right time
Information from
the external world
enters the
processing loop at
the wrong time
(misleading
sequence of
alarms)
0.44
Least
favourable
-1
-0.44
0.53
Most
favourable
0.53
0.45
Non
significant
0.00
0.03
Least
favourable
-1
-0.03
0.86
Most
favourable
0.86
0.14
Least
favourable
-1
-0.14
0.75
Most
favourable
0.75
0.20
Non
significant
0.00
0.05
Least
favourable
-1
-0.05
0.50
Most
favourable
0.50
0.50
Least
favourable
-1
-0.50
Efficient
0.67
Tolerable
0.31
Inefficient
0.03
Average traffic
complexity
Extremely high or
extremely low
traffic complexity
Least
favourable
Non
significant
0.81
0.19
379
Most
favourable
Non
significant
Least
favourable
Non
significant
Least
favourable
0.67
0.00
-1
-0.03
0.00
-1
-0.19
Appendices
18
19
20
Airspace
characteristics
Weather conditions
during the recovery
process
Conflicting issues in
the situation (task
complexity)
Adequate (e.g.
enroute higher
levels)
0.64
Most
favourable
0.64
Tolerable
0.33
Non
significant
0.00
Inappropriate (e.g.
enroute lower
levels or terminal)
0.03
Least
favourable
-1
-0.03
Improved
0.89
0.00
Deteriorated
0.11
-1
-0.11
0.00
-1
-0.28
Average complexity
of the situation
Conflicting, multiple
tasks or Extremely
low complexity of
the situation (may
lead to monotony)
0.72
0.28
380
Non
significant
Least
favourable
Non
significant
Least
favourable
Appendices
Weather
Task complexity
Duration of failure
Adequacy of organization
Adequacy of alarms/alerts onset
Adequacy of HMI and oper. support
Number of workstations affected
Comm. for recovery
Personal factors
x
x
Duration of
failure
Adequacy of
HMI and
operational
support
Ambiguity of
information in
the working
environment
Adequacy of
alarms/alerts
Adequacy of
alarms/alerts
onset
Adequacy of
organization
Traffic/traffic complexity
Training for
recovery from
ATC
equipment
failures
Previous
experience
with equip.
failures
Experience
with system
performance
(reliance)
Personal
factors
Comm. for
recovery
within a team
of controllers
Complexity of
failure type
Time course
of failure
development
Number of
workstations/
sectors
affected
Time
necessary to
recover
Existence of
recovery
procedure
DIRECT
INFLUENCE

Previous experience with equip.
failures
Appendix XI Validation of the RIFs interaction matrix
x
x
x
x
381
x
x
x
x
Appendices
Traffic/traffic
complexity in
the moment of
failure
Airspace
characteristics
Weather
conditions
during the
recovery
process
Task
complexity
NOTE:
Please mark the interactions between each factor in the upper row and each factor
from the left column. For example, does 'Training for recovery' influences any of the
factors from the left side ('previous experience', 'experience with the system', 'personal
factors', and so on). Please add or delete existing interactions as you find it
appropriate.
382
Appendices
Appendix XII Distribution of 20 Recovery Influencing Factors

(RIFs)
Level
RIF1
RIF2
RIF3
RIF4
RIF5
RIF6
RIF7
RIF8
RIF9
RIF10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4
0
0
0
0
0
0
0
0
2239488
8957952
2239488
0
0
0
0
0
0
0
2239488
8957952
2239488
0
0
0
0
0
0
0
2239488
8957952
2239488
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
13436928
6718464
0
0
0
0
0
0
0
0
13436928
6718464
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
248832
3483648
8709120
3981312
3483648
248832
0
0
0
0
248832
3483648
8709120
3981312
3483648
248832
0
0
0
0
0
0
0
0
0
0
0
168
5964
67956
379116
1227984
2513604
1653636
3393708
2513604
1227984
379284
73920
73920
379284
1227984
2513604
1653636
3393708
2513604
1227984
379284
73920
73920
379284
1227984
2513604
1653636
3393708
2513604
1227984
379116
67956
5964
168
0
0
0
0
0
0
24
2244
37908
266508
1008576
2310156
1621692
3512088
2750052
1398444
442464
82008
44760
266688
1008576
2310156
1621692
3512088
2750052
1398444
442464
82008
44760
266688
1008576
2310156
1621692
3512088
2750052
1398444
442440
79764
6852
180
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
13436928
6718464
0
0
0
0
0
0
0
0
13436928
6718464
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
10077696
6718464
3359232
0
0
0
0
0
0
0
10077696
6718464
3359232
0
0
0
0
0
0
0
0
0
0
0
0
96
4272
58656
383184
1422000
3279840
2337228
5184840
4234404
2283432
786768
162216
17670
780
6
0
0
0
0
0
96
4272
58656
383184
1422000
3279840
2337228
5184840
4234404
2283432
786768
162216
17670
780
6
0
0
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
8957952
4478976
0
0
0
0
0
0
0
0
0
383
Appendices
Level
RIF11
RIF12
RIF13
RIF14
RIF15
RIF16
RIF17
RIF18
RIF19
RIF20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
10077696
6718464
3359232
0
0
0
0
0
0
0
10077696
6718464
3359232
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
248832
2488320
5474304
2488320
2488320
248832
0
0
0
0
248832
2488320
5474304
2488320
2488320
248832
0
0
0
0
248832
2488320
5474304
2488320
2488320
248832
0
0
0
0
0
0
0
0
0
0
0
0
0
20736
684288
3836160
7527168
3545856
3836160
684288
20736
0
0
0
0
0
0
0
0
0
0
0
0
20736
684288
3836160
7527168
3545856
3836160
684288
20736
0
0
0
0
0
0
0
0
0
0
0
0
0
0
746496
5971968
3732480
2985984
0
0
0
0
0
0
746496
5971968
3732480
2985984
0
0
0
0
0
0
746496
5971968
3732480
2985984
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
124416
2363904
7589376
4354560
4976640
746496
0
0
0
0
0
0
0
0
0
0
0
0
0
0
124416
2363904
7589376
4354560
4976640
746496
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1492992
5225472
2985984
3359232
373248
0
0
0
0
0
1492992
5225472
2985984
3359232
373248
0
0
0
0
0
1492992
5225472
2985984
3359232
373248
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1119744
8957952
5598720
4478976
0
0
0
0
0
0
1119744
8957952
5598720
4478976
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6718464
4478976
2239488
0
0
0
0
0
0
0
6718464
4478976
2239488
0
0
0
0
0
0
0
6718464
4478976
2239488
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20155392
0
0
0
0
0
0
0
0
0
20155392
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
696
14778
131736
638880
1903896
3719892
2405976
4929648
3719892
1903902
639576
146514
146514
639576
1903902
3719892
2405976
4929648
3719892
1903896
638880
131736
14778
696
6
0
384
Appendices
Appendix XIII Experimental material
Experimental material consists of various documents used by air traffic controllers

participating in the study, as well as the subject matter expert (SME). The documents
used by controllers are presented in the following order:
a) The controller handbook;
b) Debriefing interview sheet; and
c) Feedback form.
The documents used by subject matter expert are presented in the following order:
d) Subject matter experts assessment; and
e) Best practice procedure sheet.
385
Appendices
a) The controller handbook
The Controller Handbook
Researcher:
Branka Subotic
Supervisor:
Dr Washington Y. Ochieng
University:
Imperial College London
Location of experiment:
XXX
June 2006
386
Appendices
SUBJECT INSTRUCTIONS
Strategic and tactical decision making in ATC

Dear Controller,
Welcome to the Strategic and tactical decision making in ATC research program. Because of
your extensive experience as an Air Traffic Controller, you have been asked to participate in this
study.
Our aim is to test a new approach to better understanding of the decision making process by air
traffic controllers. We will try to determine the cognitive processes that drive your
decisions/actions during the dynamic and complex control of air traffic. The knowledge gained
from this research will feed into the future design solutions of computerized ATC tools.
We are not in position to reveal more information on this study at this point, as it may influence
your behaviour, actions and, the processes we wish to observe and analyze. At the end of this
study you will be more familiar with our objectives and you will be able to ask as many questions
as you find necessary. So please bear with us and help us make this study as realistic as
possible.
Your understanding and help are crucial at every step of this study!
This study is designed as an integrated part of regular emergency training in Dublin ATC Centre
with the minimal impact on the controller. Therefore, please consider and treat this training
session as any other training session you have had in your professional career.
From time to time, additional information may be given to you from the training instructor or
researcher. In these occasions please act as if you would in the operational environment. Also,
when information or instructions is given to you by the researcher, please regard it as if it comes
from a training instructor.
Now, we would like you to read the Consent form which aims to inform you what the
experiment involves and to make you fully aware of your rights while you are taking part in it. So
please proceed to the next page, read the form, and sign it if you agree with all terms and
conditions. If you have any questions, please do not hesitate to contact the researcher.
In addition, we will ask you to fill out a questionnaire and participate in a de-briefing after the
training session. The De-briefing part of this experiment is of high importance as we will
compare the recorded data with your own experience and decision-making process. Therefore,
we would like to encourage you to give the researcher detailed input and explanation.
387
Appendices
IMPERIAL COLLEGE LONDON

RESEARCH SUBJECT INFORMED CONSENT FORM
Prospective Research Subject: Read this consent form carefully and ask as many questions as you
like before you decide whether you want to participate in this research study. You are free to ask
questions at any time before or after your participation in this research.
The purpose of this research is to investigate the controllers decision making process. You will
be asked to complete one emergency training session and therefore perform air traffic control
service through one traffic scenario. The entire experiment is expected to take approximately
1.5h to complete.
The results of this experiment are for research purposes only, and may be presented at
professional meetings or published in research literature. Your name will not be used in the
reporting of results. Only recorded data will be used; all personal information will be kept
completely confidential. A videotape of part of the experiment may be taken for purposes of
data collection only. Neither your face nor identity will ever be associated with any reporting of
these results.
In addition, because of the confidentiality of this experiment, you will be asked not to disclose
any information of what you have experienced today to anyone (including family, fellow
colleagues, and friends) for a next 30 days. Only in this way we can be assured that the
experiment will remain as realistic as possible. With your signature below you are accepting
these conditions. If for any reason you are unable to comply with any of the listed conditions,
please inform the researcher right away and you will be released of any other obligations.
Additionally, if you wish to withdraw from the experiment, you may do so at any time.
With Sincerest Thanks
I, ________________________________, understand that my participation in this experiment

is completely voluntary and that I may refuse to participate, or withdraw from the experiment, at
any time without penalty.
___________________________________ _________________
Participant Signature
Date
I _______________________________ the researcher undertake to guarantee the

confidentiality of the information you provided in this experiment. I understand that you reserve
the right to seek legal redress should any aspect of this agreement be breached.
___________________________________ _________________
Researcher Signature
Date
388
Appendices
Now you are ready for training session!
~ When ready contact pseudo-pilot on dedicated

R/T frequency so that your training session can be
initiated ~
389
Appendices
POST EXPERIMENT SESSION

Dear Controller,
Once again thank you very much for your participation is this experimental trial. Now
you understand what our true objective in the experiment was and why we had to keep
it confidential. Our objective in this research project is to research controller recovery
from equipment failures in ATC. However, in order to achieve the unexpected effect of
this rare occurrence, it was necessary to mask the real objective of this research.
Our aim is therefore to determine how controllers manage equipment failures. The
complexity of this experiment gave us the opportunity to test only one equipment failure
in spite of the large number of potential equipment failures in any ATC Centre. By
observing your reactions, recovery strategy, and attitude, we are aiming to identify
better solutions in design of ATC tools/systems, recovery procedures, and training. Our
belief is that current, more automated ATC Centres need to create better support to its
main element air traffic controllers.
For the above reasons, we kindly remind you that you have agreed not to disclose any
information and details from todays experiment to your fellow colleagues, family, and
friends in the next 30 days.
Once again, we would like to highlight that without your help

and understanding this research would not be possible!
390
Appendices
Post experiment questionnaire

If you need clarification at any point, please do not hesitate to contact the
researcher!
Current rating: ACC RDR Proc
APP RDR Proc
TWR
Age ____
Years of experience as a controller: ____
How suitable was your previous training to the situation (equipment failure) that you
have just experienced? Please answer this question taking into account quality of
training syllabus as well as the frequency of training. (Circle the appropriate number)
1. Suitable to the situation in question

2. Tolerable to the situation in question
3. Counter productive to the situation in question
When was your last emergency training?
1. In the last 30 days
2. In the last 6 months
3. 1 year ago
4. More than 1 year ago
Did you have training on equipment failures during that session?
Do you need better or more frequent training for unusual situations,

such as handling emergencies?
Please mark the statement that is closest to your previous experience with equipment
failures:
1. I have experienced very similar or same type of equipment failure in the past.
2. I have not experienced this particular type of failure, but have experienced other
types of equipment failures previously.
3. I have never experienced equipment failure in my professional career.
Please mark the statement that is closest to your experience with ATC system:
1. I trust ATC technology more than I trust my own judgments.
2. I trust new ATC technology but I am aware of possible failures.
3. I do not trust new ATC technology, even though it is designed to make my job
easier.
391
Appendices
How would you rate your personal ability in todays training session? Personal ability
comprises different factors, not limited to: your level of fatigue, stress, confidence,
complacency, your ability to cope with emergency situation, any family or other social
group issues, etc. based on this explanation, rate your personal ability:
1. Suitable for the recovery process
2. Tolerable for the recovery process
3. Counter productive for the recovery process
How would you rate your communication for recovery today:
1. Efficient
2. Tolerable
3. Inefficient
Would you say that you had enough time to recover from the effect(s) of the equipment
failure (taking into account possible development of less than adequate separation)?
1. Yes, time was adequate. Time necessary to recover was less than available
time in the simulation.
2. No, time was not adequate. Time necessary to recover was in excess of
available time in the simulation.
Is there relevant recovery procedure for this particular failure?
If yes, according to your opinion is that procedure:

How familiar are you right now with that procedure?
1. Very familiar
2. Semi familiar
3. Not familiar at all
Would you say that HMI and operational support have been:
Would you say that:
1. External working environment matched your internal mental model during
recovery process
2. External working environment mismatched your internal mental model at any
point of recovery
392
Appendices
How would you rate the adequacy of organisation in your ATC Centre?
1. Efficient
2. Tolerable
3. Inefficient
The quality of roles and responsibilities

The availability and adequacy of supervision
Attitude to teamwork
Support for organised exchange of past experience on eq.
failures
Personnel selection process
Shift patterns and personnel planning
Availability of team members
Availability of additional support (e.g. Assistant)
Safety culture
Communication with management and technicians (e.g.
Briefings, exchange of knowledge, bulletins)
Existence of stress management programs
How would you rate traffic complexity during the recovery process (please note: only
during the recovery process and not during the entire training session):
1. High
2. Average
3. Low
The mix of IFR/VFR

Military aircraft
The existence of priority aircraft
Speed mix of aircraft
Amount of vertical movements
Amount of crossing movements
Amount of conflicts
How would you rate the complexity of the airspace in the used scenario? The airspace
complexity was:
1. Adequate
2. Tolerable
3. Inappropriate
The number of crossing points

Proximity of crossing point s to the sector boundaries
Number of flight levels
Number of entry points
Number of exit points
Special use airspace (SUAs)
Upper vs. Lower airspace
Airways configuration
The number of neighbouring sectors
Sector geometry (e.g. sharp edges)
Size of sector
Bidirectional vs. unidirectional routes
Route length
Proximity of route to sector boundary
How would you rate weather conditions during the recovery process?
1. Improved
2. Unchanged
3. Deteriorated
393
Appendices
Considering the entire training session how would you rate the overall task complexity:
1. Conflicting, multiple tasks existed during this training session.

2. Average complexity of the situation.
3. Extremely low complexity of the situation.
How would you rate your recovery performance today?
1. Efficient
2. Tolerable
3. Inefficient
How different your todays performance is from any other day?
1. Not different at all

2. Similar
3. Very different
How representative todays performance have been of your overall ability to recover
from an equipment failure in ATC?
1. Highly representative
2. Average
3. Not representative at all
How realistic the todays task was?
1. Highly realistic
2. Moderately
3. Not realistic at all
Are you completely aware of the impact/implications of a particular failure that you have
just experienced? Do you fully understand what will happen when particular equipment
fails?
Y
N
Any comment?
Would you like to see some form of Aide-Memoire (flip chart, small laminated booklet,
HMI drop down menu) available at each CWP to assist you in recognising the effects of
a particular equipment failure and steps to be taken toward its recovery?
Y
394
Appendices
Is there any aspect of training, procedures, HMI, teamwork that could enhance your
todays recovery performance?
Thank you!!!!
395
Appendices
b) Debriefing interview structure

IMPERIAL COLLEGE LONDON
DEBRIEFING INTERVIEW STRUCTURE
Questions for each subject:

Note:
The researcher should replay the video recording from the moment of failure
injection and start further discussion with the subject.
1. How did you notice/detect that there was an equipment failure? What info
triggered the detection?
2. When exactly detection occurred?
3. What could have been the worst consequence if the situation was not detected?
4. Did you find diagnosis phase possible/necessary?
If yes go to question 4. If no go to question 7.
5. What was your diagnosis?
6. What you did with it (i.e. tried to confirm, or rule out alternatives)?
7. Was the recovery strategy influenced by diagnosis?
8. How did you choose the recovery strategy to apply (i.e. based on training, own
experience, colleagues experience, any other source of info)?
9. What could have made the situation worse?
10. Can you think of any fall-back actions which could mitigate this situation? Can
you suggest any changes to the procedures, phraseology; HMI design; fall-back
procedures that could improve the situation?
396
Appendices
c) Feedback form
FEEDBACK FORM
Concerning the study conducted by representatives of
Imperial College London at XXX ATC Centre 06/06/06 09/06/06
Dear Controller,
Having participated in this study we would like to ask you to provide your feedback on the
importance and value of this study. Please answer all questions as accurately as possible, since
these answers will guide us in our future endeavours. Your answers will be used only for the
assessment of the usefulness of this study.
Once again thank you very much for participating in this study!
Please circle the appropriate answer:

Did you find participating in this study interesting?
Do you think that this experience is beneficial for your future work?
Do you feel that this experiment raised important issues?
Do you feel that this experiment helped you to identify any gaps in your:
Knowledge
Training
Skills
Awareness of effects of unusual events
Would you be willing to participate in future studies of this type?

Do you have any other comments on the experiment?
After completing, please return this feedback form to the office of XXX.
Thank you for your time! Your cooperation is highly appreciated.
Researcher
Assistant
XXX, June 2006
397
Appendices
d) Subject matter experts assessment

ASSESSMENT OF THE DEPENDENCY VARIABLE IN THE EXPERIMENT
Our objective in this research project is to analyse the recovery from equipment failures in ATC.
Since the area of ATC is highly specialised, it was necessary to evaluate the controllers
recovery performance using the expert opinion. As a Subject Matter Expert (SME) in the area of
Air Traffic Control (ATC) you are asked to help in the assessment of the subject controllers
recovery performance.
We kindly ask you not to disclose any information and details on this experiment to your fellow
colleagues in the next 30 days so that we can assure the injection of failure as unexpected
event for each subject-controller.
Recovery effectiveness
According to the controller performance that you observed in this experiment (either live or on
the video recording of the experimental trial) it is necessary to use your professional experience
and assess the effectiveness of the controllers recovery.
Recovery is considered successful if the system returns to the normal or intermediate (but still
stable) state. In the short term (as simulated in this experiment), the situation should be stable
and control of airspace should be considered safe, but not necessarily efficient.
Please notice that the anchor points of each scale range from Firmly Disagree to Firmly
Agree. Place a mark in one of the five boxes along each line, as shown in following example.
Example
In general, I am professionally more efficient in the mornings than evenings.
x
Firmly
Disagree
Partly
Disagree
Neutral
Partly
Agree
Firmly
Agree
1. The recovery strategy implemented by this controller can be considered successful.
Firmly
Disagree
Partly
Disagree
Neutral
Partly
Agree
Firmly
Agree
2. In this traffic scenario, it was possible to implement more than one recovery strategy.
Firmly
Disagree
Partly
Disagree
Neutral
398
Partly
Agree
Firmly
Agree
Appendices
If answered partly agree or firmly agree, your answer referrers that you thought of alternative
recovery strategy(s). Please describe briefly this/these alternative(s).
3. If you were in the place of subject-controller, would you implement different recovery strategy
than he did?
Firmly
Disagree
Partly
Disagree
Neutral
Partly
Agree
Firmly
Agree
If answered partly agree or partly disagree, please specify your reasons to implement different
recovery strategy and which recovery strategy that would be. In addition, please specify any
particular/difficult issues regarding traffic situation during the recovery process:
Evaluation of the contextual factors in the training scenario:

Please circle corresponding answers according to your professional experience and expertise:
How would you rate complexity of simulated failure type?
1. Single system affected
2. Multiple system affected
How would you rate the time course of simulated failure development?
1. It was sudden failure
2. It was latent failure.
3. It was gradual degradation of system.
Would you say that controller had enough time to recover from the effect(s) of the equipment
failure?
3. Yes, time was adequate. Time necessary to recover was less than available time for
recovery in the simulation.
4. No, time was not adequate. Time necessary to recover was in excess of available time
for recovery in the simulation.
Is there recovery procedure for this particular failure?

If yes, is that procedure:
4. Suitable to the observed situation in question
5. Tolerable to the observed situation in question
6. Counter productive to the observed situation in question
399
Appendices
How would you rate duration of simulated equipment failure?

1. Short period of time (is it reasonable to consider them less than 15min)
2. Moderate period of time (is it reasonable to consider them less than 1h)
3. Substantial period of time (is it reasonable to consider them more than 1h)
How would you rate traffic complexity during the recovery process (please note: only during the
recovery process and not during the entire training session).
1. High
2. Average
3. Low
The mix of IFR/VFR

Military aircraft
The existence of priority aircraft
Speed mix of aircraft
Amount of vertical movements
Amount of crossing movements
Amount of conflicts
How would you rate airspace complexity in the used scenario?
4. Adequate
5. Tolerable
6. Inappropriate
The number of crossing points

Proximity of crossing points to the sector boundaries
Number of flight levels
Number of entry points
Number of exit points
Special use airspace (SUAs)
Upper vs. Lower airspace
Airways configuration
The number of neighbouring sectors
Sector geometry (e.g. sharp edges)
Size of sector
Bidirectional vs unidirectional routes
Route length
Proximity of route to sector boundary
How would you rate weather conditions during the recovery process?
4. Improved
5. Unchanged
6. Deteriorated
How realistic the todays task was?
4. Highly realistic
5. Moderately
6. Not realistic at all
Thank you!!!!
400
Appendices
e) Best practice procedure sheet

BEST PRACTICE PROCEDURE FOR XXX SIMULATION
Detect the problem
Either by pilots first contact or
Visually on the radar display (uncorrelated track). In this case first
assumption may be transponder failure. After confirmation that a/c
transponder is serviceable, further check on system performance should
be conducted.
Identify failure type either by ATCO or by input from the coordinator
Locate traffic
Check identity of all tracks (referring to the eastbound overflight)
Identify traffic using appropriate technique
Bearing/range
Turn method
Inform all traffic on RTF of the failure and advise of possible restrictions
Maintain identification of all traffic
Ground trainer
Refuse departures permission to depart
Get all airborne traffic to land
Maintain accurate and timely strip marking throughout the process
Utilize holding patterns when necessary
After restoration has been confirmed by coordinator:
Re-identify all traffic
Confirm Mode C
Continue to monitor
Release all departures
First possible detection/action may have occurred at:
______________
First actual action occurred at:
______________
End of the recovery process (release of the departures):
______________
401
Chapter 13
Appendices
Appendix XIV Overview of RIFs, their corresponding levels, and

probabilities determined in the experimental investigation
(1)
(2)
(3)
(4)
ID
RIF name
Descriptor
Probability
(p)
Internal factors
Training for
recovery from ATC
equipment failure
Previous experience
with equipment
failures
Experience with the

system performance
(reliance or trust)
Personal factors
Communication for
recovery within
team/ATC Centre
type
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
Experienced with a
particular type of
failure or
Experienced with
any other type of
ATC equipment
failure
No experience with
ATC equipment
failures
Objective attitude
toward the system
Positive experience
with the system
(excessive trust) or
Negative
experience with the
system (undertrust)
Suitable for the
recovery process
Tolerable for the
recovery process
Counter productive
for the recovery
process
Time course of
failure development
Number of
affected
Level
Designator
(R)
(8)
Probability
of overall
situation
occurring
(p*R)
0.73
0.23
Non
significant
0.03
Least
favourable
-1
-0.03
0.83
Most
favourable
0.83
0.17
Non
significant
0.93
Non
significant
0.07
Least
favourable
-1
-0.07
0.83
-1
-0.03
0.27
-1
-0.07
-1
-1
0.83
0.13
0.03
Tolerable
0.67
Inefficient
0.07
Sudden failure
(7)
Most
favourable
0.27
Persistent or latent
failure
Gradual
degradation of
system
One
workstation/one
sector or All
workstations in one
sector
(6)
0.73
Efficient
Single system
affected
Multiple systems
affected
(5)
Expected
effect of
controller
recovery
performance
0
1
Most
favourable
Non
significant
Least
favourable
Most
favourable
Non
significant
Least
favourable
Non
significant
Least
favourable
Improve
Non
significant
Least
favourable
-1
Non
significant
Appendices
10
External or factors related to working conditions
11
12
13
16
Airspace related factors
17
18
19
20
Time necessary to
recover
Existence of
recovery procedure
Duration of failure
Adequacy of HMI
and operational
support
Ambiguity of
information in the
working
environment
Adequacy of
organisation
Traffic complexity
Airspace
characteristics
Weather conditions
during the recovery
process
Conflicting issues in
the situation (task
complexity)
Several
workstations/couple
of sectors or All
workstations/all
sectors
Adequate - less
than available time
Inadequate - in
excess of available
time
Suitable to the
situation in
question
Tolerable to the
situation in
question
Least
favourable
-1
-1
0.86
Most
favourable
0.86
0.14
Least
favourable
-1
-0.14
Most
favourable
Non
significant
-1
-1
Inappropriate
Moderate period of
time or Substantial
period of time
Suitable to the
situation in
question
Tolerable to the
situation in
question
Counter productive
to the situation in
question
External working
environment
matches the
mental model
External working
environment
mismatches the
mental model
Least
favourable
-1
0.5
Most
favourable
0.5
0.39
Non
significant
0.11
Least
favourable
-1
-0.11
Most
favourable
Least
favourable
-1
0.4
-1
-0.1
Efficient
0.4
Tolerable
0.5
Inefficient
0.1
Average traffic
complexity
Extremely high or
extremely low
traffic complexity
Adequate (e.g.
enroute higher
levels)
Least
favourable
Non
significant
0.35
Most
favourable
Non
significant
Least
favourable
Non
significant
0.65
Least
favourable
-1
-0.65
0.8
Most
favourable
0.8
Tolerable
0.1
Non
significant
Inappropriate (e.g.
enroute lower
levels or terminal)
0.1
Least
favourable
-1
-0.1
Improved
0.83
Deteriorated
0.17
-1
-0.17
-1
-0.7
Average complexity
of the situation
Conflicting, multiple
tasks or Extremely
low complexity of
the situation (may
lead to monotony)
0.3
0. 7
403
Non
significant
Least
favourable
Non
significant
Least
favourable
Appendices
Appendix XV Distribution of the recovery context indicator

captured in the experiment
The distribution of the recovery context indicator (Ic) obtained from the experimental
results is presented in Figure 1.
800
700
Frequency
600
500
400
300
200
100
-0
.0
88
-0
.0
78
-0
.0
6
-0 8
.0
58
-0
.0
4
-0 8
.0
38
-0
.0
2
-0 8
.0
18
-0
.0
08
0.
00
2
0.
01
2
0.
02
2
0.
03
2
0.
04
2
0.
05
2
0.
06
2
0.
07
2
0.
08
2
0.
09
2
0.
10
2
0.
11
2
Figure 1 Distribution of the recovery context indicator in the experimental investigation

(six RIFs defined through one level)
Based on the shape of the Ic distribution, the data has been fitted with two normal
distributions according to equation 1 (Figure 2). The distribution on the left accounts for
unfavourable recovery contexts whose recovery context indicator takes the average
value of -0.04 (A1=141.4, SD1=0.02). The distribution on the right accounts for
favourable recovery contexts whose recovery context indicator takes an average value
of 0.04 (A2=632.8, SD2=0.04).
(x )2
(x + 0.04)2
(x 0.04)2
(x 1)2
2
20.022
20.042
2 2
2 2
1
2
f (x) = A e
+ A e
= 141.4e
+ 632.8 e
1
2
404
Appendices
Figure 2 Fitting of the two normal distributions
405

PHD Thesis - HF in ATC

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

PHD Thesis - HF in ATC

Încărcat de

Drepturi de autor:

Formate disponibile

FRAMEWORK FOR THE ANALYSIS OF CONTROLLER RECOVERY

FROM EQUIPMENT FAILURES IN AIR TRAFFIC CONTROL

Branka Subotic (MSc BSc)

Philosophy of the University of London and for the

Diploma of Membership of Imperial College London

Centre for Transport Studies

Department of Civil and Environmental Engineering

Imperial College London, United Kingdom

Having started my research initially at the EUROCONTROL Experimental Centre (EEC) in

2 FUNDAMENTALS OF AIR TRAFFIC MANAGEMENT AND CONTROL

2.5.1 Challenges of automation

3 PRELIMINARY ASSESSMENT OF EQUIPMENT FAILURES IN AIR TRAFFIC

4 EQUIPMENT FAILURES AND TECHNICAL DEFENCES IN AIR TRAFFIC CONTROL

4.5 Qualitative equipment failure impact assessment tool

5 AIR TRAFFIC CONTROLLER RECOVERY

6.7.3.1 Experience with equipment failures (Q1)

7 METHODOLOGY FOR A SELECTION OF RELEVANT AIR TRAFFIC CONTROLLER

8 QUANTITATIVE ASSESSMENT OF THE RECOVERY CONTEXT

8.3.1.4 Past literature

9 EXPERIMENTAL INVESTIGATION OF THE AIR TRAFFIC CONTROLLER

10 ANALYSIS OF EXPERIMENTAL RESULTS

10.3.1 Recovery context

Distribution of 20 Recovery Influencing Factors (RIFs)

Overview of the thesis

connected by an arrow, signs (+, -, 0) indicate the direction of effect

Recovery phases, their corresponding influencing factors, and

Summary of available data, number of reports, and equipment failure

Airborne Collision Avoidance System

Computerised Automatic Terminal Information Service

Human Error Probability

Nuclear Action Reliability Assessment

1.1 Background to the problem

There is a vast amount of Human Reliability Assessment (HRA) research on recovery

of emergency situations. This study highlighted the consequences of equipment

1.2 Research objectives

Personal correspondence with EUROCONTROL G2G project team.

1.3 Outline of the thesis

Having established the framework for the assessment of equipment failures in

Figure 1-1 Overview of the thesis

Fundamental of ATM and ATC

Fundamentals of Air Traffic Management and

2.1 Air Traffic Management

Fundamental of ATM and ATC

Figure 2-1 Air transport system (from Subotic et al., 2005)

Fundamental of ATM and ATC

2.2 Air Traffic Control

Figure 2-2 Flight profile (adapter from ICAO, 2001b)

Fundamental of ATM and ATC

2.2.1 Area control service

Fundamental of ATM and ATC

2.2.2 Approach control service

2.2.3 Aerodrome control service

Fundamental of ATM and ATC

2.3 Overall Air Traffic Control system architecture

Fundamental of ATM and ATC

Air Traffic Flow

Air Traffic Control

Air traffic services

Fundamental of ATM and ATC

2.3.1 Air Traffic Control functionalities

Fundamental of ATM and ATC

Figure 2-4 Communication function

Firstly, the communication function is supported by a Voice Switching Communication