Documente Academic
Documente Profesional
Documente Cultură
AbstractSituation awareness of todays automation relies so least in nearly every medium-sized car [4], speech recognition
far on sensor information, data bases and the information is used to give voice instructions to corresponding assistance
delivered by the operator using an appropriate user interface. systems. However, all these systems still require improvements
Listening to the conversation of people is not addressed until regarding their recognition rate, necessitating large future
today, but an asset in many working situations of teams. This investments into ASR technology to receive the impression
paper shows that automatic speech recognition (ASR) integrating talking to a real human.
into air traffic management applications is an upcoming
technology and is ready for use now. In an air traffic control (ATC) working environment,
communication between the involved parties is the most
Apples Siri or Googles Voice Search are based on hundreds important mean to control the air traffic. Controlling aircraft in
of thousands of hours of training data. This paper presents an the vicinity of an airport is an example of such a working
assistant based speech recognition system (ABSR), based on only environment in which two working groups communicate, i.e.
40 hours of training data. ABSR uses speech recognition pilots and controllers. All pilots in the same sector are
embedded in a controller assistant system, which provides a supported by a dedicated controller (team). They use a unique
dynamic minimized world model to the speech recognizer. ASR frequency for communication within this sector. This enables a
and assistant system improve each other. On the one hand, the party line effect, i.e. all actors excluding todays assistant
latter significantly reduces the search space of the first one,
systems can create a common mental model of the current
resulting in low command recognition error rates. On the other
hand, the assistant system gains benefits from ASR, if the
situation and of future actions.
controllers mental model and the model of the system deviate Furthermore, the complexity of todays assistant systems
from each other. Then the controller cannot rely on the output of steadily increases. On the one hand, computer power and
the system anymore, i.e. the assistant system is useless during system complexity continuously increases. On the other hand,
these time intervals. By using ABSR the duration of these time automation lacks in intuitive interfaces for humans, which
intervals is reduced by a factor of two. would enable a smooth and fluent interaction. An important
ability is to follow human communication to react and adapt to
KeywordsArrival Management (AMAN), Automatic Speech
Recognition (ASR), Workload
a situation and reduce the amount of dedicated interaction
between user and machine.
I. INTRODUCTION Today, however, communication is still split into two
Conversation is a core element of society concerning its different worlds: one in which humans communicate via radio
further development since centuries. Hence, a significant part links, and another in which machines communicate via
of human collaboration is coordinated via voice, especially computer networks. These two worlds are connected by a
when complex contexts or meta-concepts are considered. By human machine interface used by humans to inform the
tracing communication new actors can get an idea of the actual machines and vice versa. Intents and plans of both humans and
and planned situations and interpret the actions, so that they machines are the basis for these two worlds. As controllers are
can easily integrate themselves into this environment. Listening responsible for air traffic control, they sometimes implement
actors can follow the train of thoughts and are able to plans deviating from those of the automation. If these
contribute to problem solving activities in an appropriate deviations occur in situations with high workload, the
manner with own ideas. controllers do not have time to inform the assistant system
about their strategies and intentions. In these cases, the
Nowadays, people get more and more supported by automation may suggest advisories contrary to the intent of the
technical systems like assistant or decision support systems controller because the support system is not aware of operator
which can be found in nearly every working and leisure induced deviations. Even worse, the operators have additional
environment. Latest applications, like those by Apple (Siri) effort to inform the support systems about their
[1], [2] and Google (Voice Search) [3], use ASR as input communication. This situation may persist until the assistant
interface for a direct communication between human and system realizes the deviation, e.g. through the analysis of radar
machine to trigger a defined action, as for instance a query. At data. Hence, the system requires attention from the controllers,
exactly when the controllers would urgently need the support spread, such as in mobile phones, TV sets or car
of the system due to high workload. navigation systems [4].
To overcome this situation and to enable air traffic 2. Dictation software: These systems mainly target the
management (ATM) systems to follow the conversation professional market as their adaptivity is not yet good
between controller and pilots, Automatic Speech Recognition enough for widely accepted consumer products [8].
is an important element of future ATM assistant systems. The
ability to listen has to be implemented into the assistant system. 3. Spoken dialog systems: These systems benefit from the
This allows following the conversation and to synchronize the advances of dialog management research, as well as ASR.
intents and strategies of human and machine world. Common examples are dialog systems used for train time
table consultation [9]. Similarly, Siri [1] and Googles
Crucial for user acceptance is the quality of ASR, search by voice [3] are assistant systems that integrate
especially recognition time and rate. Actual studies in the question answering and spoken dialog functionalities.
automotive environment achieve word recognition rates
between 78% and 87% [5]. Such rates are far too low for an The ATM world, following and deploying the advances of
application in the ATC environment and here need to today's research and technology, is increasingly developing
understand the whole command. Due to the high number of ASR-based applications to provide more sophisticated assistant
utterances per hour in ATC, too many corrections would be systems. ASR is a potential extension of many existing systems
necessary for the controller, which would strongly decrease the where speech is the primary mode of communication, such as
acceptance of such a system. Arrival Managers (AMAN), Surface Managers (SMAN), and
Departure Managers (DMAN). First commercial
The integration of ASR into ATM systems has been implementations of an AMAN have been operational at hubs
attempted since (at least) the early 90s. We briefly review this (Frankfurt, Paris) since the early 90s. Today, their application
prior work in section II. In section III it is shown how ASR has is still limited to the coordination of traffic streams between
to be integrated with an assistant system, so that acceptable different working positions (e.g. sector and approach
error rates are achieved. This enables common situation controllers) [10].
awareness without lack of information on the part of the
automation and without discrepancies between voice Although ASR performance improved significantly in the
communications and data link information. In other words, the last decade, it is far from being a solved problem, in particular
ASR system is not blindly listening, but it is rather aware of the for large vocabulary applications. Limited vocabulary (in-
situation and, therefore, can actively anticipate the spoken domain) applications, however, are being more successfully
commands. The recognized commands are then processed and deployed. Moreover, the existence of prior information about
relevant information is extracted to dynamically derive new the task at hand and the expected spoken sentences summarily
planning and advisory. This win-win situation between the referred to as context can significantly improve performance.
ASR component and the assistance system results in a more Early usage of context goes back to Young et al.s works [11],
accurate and reliable assistance. [12], where they made use of sets of contextual constraints of
varying specificity to generate several grammars. These
Section V describes the different AMAN support levels grammars are then consecutively used during recognition until
offered to the controller during the trials, which are described a satisfactory recognition hypothesis is found. Along this idea,
in section IV. They have been performed with a combination of Fgen et al. [13] also used dialogue-based context to improve
DLRs arrival manager 4D-CARMA (4D Cooperative ARrival ASR quality in their dialogue system where a Recursive
MAnager) [6], the speech recognizer from Saarland University Transition Network (RTN) representing the grammar is
(UdS) and approach controllers from Austrian, Czech and continually updated.
German ANSPs in the Air Traffic Validation Center at the
DLR premises in Braunschweig. Section VI describes the first The first attempts to integrate ASR in ATM systems goes
validation results. The last section describes further steps and back to Hamel et al. [14] who described the application of
summarizes the results. speech technology in ATC training simulators in the early 90s,
however with limited success. Schfer [15] used an ASR
II. BACKGROUND system to replace pseudo pilots in a simulation environment.
He used a dynamic cognitive controller model. He, however,
Human-machine interaction systems have received a does not use an assistant system to dynamically generate the
significant improvement in their performance in the last context so that assistant system and ASR improve each other.
decade, leading to more sophisticated human-machine Dunkelberger et al. [16] described an intent monitoring system
applications. Voice-enabled systems in particular are which combines ASR and reasoning techniques to increase
increasingly deployed and used in many different areas. The recognition performance: In a first step, a speech recognizer
most popular use case among these is ASR deployed on most analyses the speech signal and transforms it into an N-best list
mobile phones. In fact, ASR systems are becoming a of hypotheses. The second step uses context information to
significant component in hand-free systems and a cornerstone reduce the N-best list. The approach presented in this paper,
for tomorrow's applications, such as smart homes [1]. ASR however, uses context to directly reduce the recognition search
applications can be generally classified to three different space, rather than only rejecting the resulting hypotheses. The
categories based on the application purpose: latter approach would only reduce error rate without increasing
1. Hand-free command & control: The purpose of these recognition rate.
applications is to control a given system by relatively short These early attempts opened the gate to more concrete and
spoken sentences. This type of application is widely successful applications of ASR in ATC training simulators.
The FAA (Federal Aviation Administration) reports the
successful usage of advanced training technologies in the III. ASSISTANT BASED SPEECH RECOGNITION
terminal environment [17]. The Terminal Trainers prototype, In a pilot study with a limited set of callsigns and
developed by CAASD (Center for Advanced Aviation System commands, Shore [23] reported command error rates below
Development, USA), is a complex system which includes 5%. He used an acoustic model derived from the Wall Street
speech synthesis, speech recognition, multimedia lessons, Journal recognition corpus. This was the basis of the
game-based training techniques, simulation, and interactive AcListant project started in Feb. 2013 [25]. Due to the
training tools. The German air navigation service provider DFS considerably larger number of callsigns and commands in this
uses the system Voice Recognition and Response (VRR) of project, initial experiments using the same corpus returned
UFA (Burlington, MA) for controller trainings since August command error rates greater than 70%. In the first project
2011 [18] to reduce the number of required pseudo pilots in its phase many hours of speech samples from real controller pilot
flight service academy. These systems, however, are being communication were recorded and annotated, i.e. every
used only for training purposes, due to the ASR standard controller utterance was written down word by word. The
phraseology limitations. In a more general application of ASR resulting model already improved recognition rates drastically
to ATM domain, Cordero et al. proposed to take advantage of from 30% to 80%, but error rates were still above 20%.
the speech-to-text output of ASR systems to perform controller Therefore, we concentrated on providing dynamically
workload assessment, which was consequently used for generated situational context to improve performance.
automatic ATC events detection [19].
Figure 1 describes the concept of assistant based speech
Although data link might replace voice communication in recognition (ABSR). An Assistant System (in our case the
the ATC environment, voice communication and data link with Arrival Manager 4D-CARMA) analyses the current situation of
their different advantages will coexist for a long time at least in the airspace and predicts possible future states.
General Aviation. Here voice communication will remain the
central means of coordination in the foreseeable future. The The output of the Assistant System, the context, (e.g.
agreements coordinated by voice, automatically have to be aircraft sequences, distance-to-go, minimal separation, aircraft
integrated into SWIM (System Wide Information state vectors) is used by the Hypotheses Generator
Management) based on reliable speech recognition. The same component. The Hypotheses Generator does not know
applies for different types of on-board equipment of airliners. exactly which commands the controller will give in the future,
This supports the coexistence of varying levels of automation but it knows which commands have a higher probability than
in different tightly-coupled subsystems. Furthermore, careful others in the current situation.
transitions between different levels of automation are more
easily possible.
ASR systems generally use the Word Error Rate (WER)
metric for evaluation. This metric is defined as the distance
between the recognized word sequence and the sequence of
words which were actually spoken, referred to as the gold
standard (see pp. 362-364 in [20]). WER is defined as a
derivation of Levenshtein distance [21]: