Sunteți pe pagina 1din 15

1

Immune-Inspired Adaptable Error Detection for


Automated Teller Machines
Rogério de Lemos, Member, IEEE, Jon Timmis, Member, IEEE, Modupe Ayara and Simon Forrest

Abstract— This paper presents an immune-inspired adaptable developed against another. Extensive testing has been carried
error detection (AED) framework for Automated Teller Machines out with the system proposed in this paper, and has laid the
(ATMs). This framework has two levels, one is local to a single foundations for adaptable error detection in ATMs.
ATM, while the other is network-wide. The framework employs
vaccination and adaptability analogies of the immune system. For This paper details the investigations undertaken to develop
discriminating between normal and erroneous states, an immune an immune-inspired adaptable error detection (AED) tech-
inspired one-class supervised algorithm was employed, which nique for ATMs. In the context of this application, error
supports continual learning and adaptation. The effectiveness of detection entails to identify a sequence of states that precedes
the proposed approach was confirmed in terms of classification a system failure, and adaptability comprises of changing a set
performance and impact on availability. The overall results were
encouraging as the downtime of ATMs can de reduced by of detectors according to the operational profile of the ATM.
anticipating the occurrence of failures before they actually occur. Underlying the immune-inspired adaptable error detection is
a framework that is based on the architecture of a network of
Index Terms— Adaptable error detection, artificial immune ATMs, which consists of individual ATMs that are networked
systems, ATMs, availability, fault tolerance. to a central management system. The network supports a two-
way communication mechanism between the central manage-
ment system and connected ATMs. Likewise, the proposed
I. I NTRODUCTION framework for adaptable error detection consists of two levels
of error detection. One level of the framework is local to
A UTOMATED Teller Machines (ATMs) are embedded
systems for financial-related services. Work presented in
this paper is concerned on how to improve the availability of
a single ATM, while the other is a network-wide adaptable
error detection. The latter is for exchanging information on
these systems through adaptable error detection. Adaptability new and common error behaviors amongst individual ATMs.
is an important feature for improving the availability of these In this architecture, each ATM hosts a local AED, while the
systems because ATMs exhibit different operational profiles network-wide AED is implemented within the central man-
depending the environment they work [1]. If the downtime agement system. By exploiting the communication mechanism
for these machines has to be reduced, alternative techniques between the central management system and individual ATMs,
have to be investigated different from the ones that rely on a exchange of information regarding error detectors amongst the
fixed set of error detectors that are usually identified during local AED is made possible through the network-wide AED.
design-time. The proposed error detection technique aims to The implementation undertaken in this work was limited
reduce system downtime by detecting during run-time those to the local AED. The network-wide AED is based on the
states that are precursors of system failure. This is achieved by same techniques, the the only difference being that the error
employing immune inspired continuous learning for updating detectors produced at this level are general across a number
the set of error detectors in a system. The technique relies of ATMS, and that intervention from a human operator is
on the existence of sequences of states that represent the required to decide which detectors should be incorporated into
operational status of an ATM, from which the adaptable error the network of ATMs. An ATM is made up of several modules,
detection is able to identify those sequences that might contain but a single module - the cash dispenser - was employed for
fatal states. At present, ATMs do not include the notion of the implementation and validation of the local AED technique.
adaptability in their error detection systems, and this is an The basis of this technique was an artificial immune system
important feature for improving the maintainability of these originally developed for email classification [2].
machines. This paper reports an industrial led research project, Artificial immune systems (AIS) are adaptive systems,
which was developed in close cooperation with NCR Financial inspired by theoretical immunology and observed immune
Solutions Group. The aim was to investigate how machine functions, principles and models, which are applied to prob-
learning techniques could be used in improving the quality of lem solving [3]. Adaptability in the immune system ensues
services of ATMs. This is the only work of its kind on ATMs, from features such as learning and memory that endow the
and therefore it is impossible to benchmark any final system immune system with the ability to combat a large variety of
invaders. The application of AIS to fault tolerance was initially
This research was fully funded by NCR Financial Solutions Group. motivated by Avižienis, who described the analogy between
R. de Lemos is with the University of Kent. the immune system and fault tolerance [4]. Since then, several
J. Timmis is with the University of York.
M. Ayara is with the BT Group CTO. approaches have been proposed in literature that have applied
S. Forrest is with the NCR Financial Solutions Group. AIS to problems related to both software [5] and hardware [6]
2

fault tolerance. section of the paper presents some concluding remarks and
Although a wide range of machine learning techniques have future work concerning the application of AIS techniques to
been applied to error detection, some of these techniques have error detection in systems that are continuously subjected to
potential drawbacks. In particular, from the perspective of change.
adaptable error detection, an extensive study was performed
comparing different techniques [1], which has highlighted II. AVAILABILITY AND M AINTAINABILITY IN ATM S
the disadvantages of some of these techniques. For example,
A. Automated Teller Machines (ATMs)
artificial neural networks are “black boxes” which means that
it is difficult to understand the hidden knowledge encoded Automated Teller Machines (ATMs) are embedded systems
therein. Expert systems are cumbersome to maintain since for financial-related services, such as, dispensing cash, bank
they involve continual updating of the knowledge base to keep account enquiries, printing of balances, and cheque deposits.
up with changes. In addition, most of the solutions are not An ATM is made up of self-contained modules that include
capable of continuous learning, which implies that they are cash dispenser for delivering money, magnetic card reader
incapable of updating their knowledge during the monitoring (MCRW) for reading debit or credit cards, keypad as input
of the target systems operations. In the context of this work, interface, display as output interface, printer for printing
two alternative techniques were investigated in more detail, receipts, and depository for depositing cheques.
and were demonstrated to be inappropriate for the purpose of The architecture of a system of ATMs is composed of
detector generation [1]. The C5.0 algorithm (a rule induction individual ATMs that are networked to a central manager.
algorithm) was shown to generate generalised detectors that The network supports a two-way communication mechanism
would significantly increase the number of false positives. In between the central manager and connected ATMs. The ar-
fact, our analysis discovered that for the data used in this study, chitecture of a system of ATMs is depicted by figure 1. Each
any detectors generated were so general that a decision was ATM (ATM 1, ATM 2, ATM 3, and ATM 4) is connected in a
made to investigate alternative approaches (see Appendix A network to a Central Manager, which is able to receive and
for brief results using C5.0 algorithm). In addition, a heuristic- send information to connected ATMs. Both the ATMs and the
based algorithm generated detectors so specialised that it was central manager mentioned in this paper are proprietary of
difficult to discern the true positives [1]. Moreover, these two NCR Financial Solutions Group thus certain technical details
approaches do not allow for easy online adaptation, which have to be omitted. ATMs are highly available systems, hence
is a requirement for adaptable error detection for ATMs. We the need for effective error detection techniques that are able to
therefore concluded, an alternative approach was required. reduce their down time. One of the error detection mechanisms
An appropriate approach for adaptable error detection incorporated into ATMs exploits the syntactic knowledge of
should be able to generate suitable and effective detectors a code space. This space is generated by a set of rules that
that exhibit effective and comprehensible classification per- map each state of a device into a corresponding code. As
formance, and to incorporate a continuous learning feature. a result, error detection is simply identifying the erroneous
Continuous learning is needed to incorporate new information states of each component based on the semantic description of
about errors into an AED system during system operations. relevant codes. Another error detection mechanism exploited
An artificial immune system (AIS) algorithm was found to in ATMs are preemptive diagnostic checks to test the states of
possess these characteristics, and was evaluated by using the processors controlling the modules.
relevant criteria that include: (1) classification performance
of the algorithm in discriminating normal behaviours from Tolerisation
potential failure behaviours, and (2) the measurement of the Devices
Validation

time interval between detection and the actual system failure. M−Status
M−Data Evaluation
ATMLog

From the outcome of the evaluation, it was demonstrated that Error detection
Immunisation

the proposed AED technique could detect an incipient system Learning


Network−wide AED

failure approximately 12 hours for one data set, and 2 hours Tolerisation
Central Manager

for a second data set. Based on these results, it is concluded DevLog


Validation
that the framework, and subsequent prototype, is effective for Incorporation
adaptable error detection.
Evaluation
The rest of the paper is structured as follows. The following
Local AED
section introduces the problem domain, which is the enhance- ATM 2 ATM 3 ATM 4

ment of availability and maintainability in ATMs. Section ATM 1

III reports on related work in the area of AIS applied to


Fig. 1. System architecture for adaptable error detection in ATMs
fault tolerance. Section IV defines a general framework for
adaptable error detection in context of a single ATM and a
network of ATMs, with the following section discussing how A problem associated with these error detection techniques,
artificial immune systems (AIS) can be exploited for adaptable in the context of ATMs, is that they are unable to detect
error detection. The subsequent three sections describe, respec- erroneous behaviours that have no corresponding codes. In
tively, the prototype that was implemented, the experiments a situation where complete knowledge of all the possible
performed, and an analysis of the results obtained. The final states of a system is available, this approach is an appropriate
3

solution. However, ATMs are complex systems derived from lymphoid organs, and lymphocytes) are distributed throughout
the integration of disparate components, therefore it is difficult the body to serve all of its organs, it has its own communica-
to identify all the possible interactions, and their failure tion links - the network of lymphatic vessels, and its elements
behaviours, that could occur between the components. Also, (organs and vessels) are themselves redundant and in some
there are different families of ATMs, which may be used cases diverse. The conclusion is that a nature-inspired model
in disparate geographical regions characterised by peculiar such as the immune system, will stimulate development of
environmental conditions. In consequence, the errors generated fault tolerant solutions that will outperform current solutions.
by individual ATMs may vary with the family or location This suggestion triggered current research efforts in the area.
thereby making it difficult to anticipate all possible errors of Properties such as diversity, redundancy, self-organisation,
an ATM. Furthermore, different maintenance procedures and anomaly detection, learning, and memory are all important
personnel might affect the operation of an ATM. The objective from a fault tolerance perspective.
is to address these issues by adaptable error detection, in order A pioneering research in the application of AIS to fault
to enable an ATM to cope with uncertainties, as well as to tolerance was presented by [5]. The authors applied AIS to
facilitate informed and quicker maintenance. tolerate software design faults. One important contribution of
the paper is the mapping of the immune system analogy to
B. Dependability fault tolerance context, based on the immune system compo-
nents. The authors further developed a model for software fault
Availability and maintainability are attributes of dependabil- tolerance based on the immune network theory. The idea is to
ity, and dependability is essentially the ability of a system generate artificial antibodies that can be used to detect errors
to deliver service that can be justifiably trusted [7]. The in software. The model is divided into two phases of learning
dependability attributes express the properties of a system, and and operational. During the learning phase, antibodies are
allow the quality of its services to be evaluated. generated randomly and evolved using a genetic algorithm.
There are several means for obtaining dependable systems The operational phase is when the antibodies generated are
(in our case, highly available and maintainable systems), applied to error detection in a software.
including rigorous design, verification and validation, system Research into hardware fault tolerance can be described
evaluation and fault tolerance [7]. The scope of this work is under fault diagnosis and error detection. Work on fault diag-
fault tolerance, which allows a system to deliver its specified nosis has focused on applying immune network concepts for
service despite the presence of faults [8]. The premise is that, defining relationships between data from sensors such as [10].
systems will always contain residual faults regardless of fault The sensors can be likened to cells in an immune network such
prevention and removal mechanisms. that each sensor dynamically evaluates other connected sensors
Fault tolerance is carried out via error detection and recov- for inconsistencies based on the relationships between them.
ery. Error detection is responsible for identifying the presence This network model was applied to the automatic diagnosis
of an error in a system. Recovery transforms a system state that of faults in cement plants [11]. The results indicate that the
contains one or more errors and (possibly) faults into a state model provides accurate information about faulty sensors.
without detected errors and without faults that can be activated However, the model is limited by the need to define accurate
again [7]. Error detection is the trigger for fault tolerance, thus relationships between sensors.
the need for having an effective error detection capability, if In another work related to failure propagation, relationships
a system is expected to tolerate faults. between sensors and other system components are identified
Error detection techniques usually exploit known error to indicate the direction of failure propagation within a sys-
profiles for detecting erroneous states and behaviours. Such tem [12]. The approach was based on the stimulation and
techniques rely on monitoring of system’s behaviour with re- suppression analogies of the immune network. Based on the
spect to a given set of rules that include (1) adherence to given stimulation and suppression from connected components, each
control-flow paths, (2) execution time limits, (3) data integrity component is associated with mathematical values known as
checks, (4) comparison among redundant components and (5) failure origin ratios that indicate the possibility of failing,
algorithm-based plausibility checks of data [9]. However, these and can be used for locating the source of the failure. Some
approaches restrict the detection of errors to those that are simulation results were reported which showed that the origin
known at design-time. A limitation of such approaches is that of failures could be traced successfully based on the failure
error detection techniques are not expected to adapt to new propagation network.
patterns of behaviour that can emerge during system operation. More pertinent to this research is the investigations of AIS to
This observation motivated the investigations into adaptable error detection, which can be found in [6], [13], [14]. By taking
error detection for enhancing availability of ATMs. ideas from [4] and [5], Bradley and Tyrell have examined the
application of AIS to error detection in hardware [15]. The
III. R ELATED W ORK name immunotronics was coined for immune-based hardware
The analogy between fault tolerance and the immune system fault tolerance. They proposed a mapping from the immune
was first expressed by Avižienis [4]. In that paper, four system to hardware fault tolerance which later led to the
attributes of the immune system that support the idea are: development of models for a hardware immune system using
the immune system functions continuously and autonomously, the attributes specified by [4]. One of the approaches pro-
independent of cognition, its elements (lymph nodes, other posed was based on a lymphatic network, to be implemented
4

as an error detection system for an embryonic architecture precise, assume there is a family of embedded systems with
[16]. Embryonics takes its inspiration from the embryonic similar functions and behaviours whereby each system is
development of multi-cellular organisms. This concept relates characterised by its own unique features. The idea is to extract
to the generation of generic cells during cell reproduction, generic error detectors corresponding to error signatures com-
such that generic cells are able to take over the function of mon to these systems. Therefore, these generic error detectors
any other cell before differentiation occurs. The idea was then serve as the minimum set of detectors across all the systems
applied to the implementation of component-level redundancy compared to populations of detectors that are unique to indi-
in hardware for achieving fault tolerance. A further model for vidual systems. In contrast, the run-time adaptation phase con-
hardware immune fault tolerance was developed [14], [15], fers on each system a more specialised set of detectors and is
[17], which was based on negative selection algorithm [18], responsible for augmenting the detectors that are more generic
[19]. Using this model, hardware can be immunised with tol- (through the use of an evolutionary process). The specialised
erance conditions or antibodies that act as error detectors. The error detectors are generated from error sequences observed
target hardware is represented as a finite state machine (FSM) during run-time operations of the system. Furthermore, the
that defines the acceptable states and transitions between the framework divides the learning mechanisms into two levels:
states. However, the work undertaken by Bradley et al, is fun- (1) learning within a system and (2) learning amongst systems.
damentally different to the work presented in this paper. First, The two levels are represented as local AED and network-wide
Bradleys’ approach was concerened with the fault tolerance of AED, as illustrated in figure 2.
hardware i.e. chips that could reconfigure themselves once an
error had been detected, we are concerned with the detection of Local AED Network−wide AED

errors prior to their occurence. In addition, Bradleys’ approach Offline process


was underpinned by negative selection, and once the detectors
[Historical patterns from target systems]
had been placed on the chip, no immune mechanism was
employed for adaptable error detection, this was achieved via Detector generation

a developmental approach.
Immunisation

[Real−time behaviours of target system]


IV. F RAMEWORK FOR A DAPTABLE E RROR D ETECTION
Error detection Immunisation
A framework for adaptable error detection (AED) was [Novel error behaviours from target system]
developed, which employs ideas from vaccination, and adapt-
Learning [Competent−network detector]
ability analogies of the immune system. Vaccination or immu-
[Immature detector]
nisation is a process of priming the natural immune system
against the occurrence of a disease by introducing attenuated Local tolerisation Network validation

antigens of the disease [20]. This process allows the immune


system to generate antibodies for the introduced antigens,
with the effect that subsequent invasions by similar antigens [Immature−network detector]
induce secondary immune responses. Therefore, this process Local validation Propagation Network tolerisation
endows the immune system with knowledge about antigens [Competent detector]

which it had not previously encountered, and enables it to Incorporation

adapt to novel antigens during the primary immune response.


This confers on the immune system the ability to detect novel Local process evaluation Network process evaluation
patterns and react accordingly thereby supplementing existing
knowledge about antigens.
Fig. 2. Activity diagram of the framework for adaptable error detection
In the proposed framework, the immunisation metaphor
corresponds to the traditional error detection approach of
deploying a set of error detectors, which are representative of
known error signatures. However, the problem of traditional V. A RTIFICIAL I MMUNE S YSTEMS FOR A DAPTABLE
techniques is the inability to detect unexpected erroneous be- E RROR D ETECTION
haviors. What is required is a system that can continually learn As already mentioned the design of the framework for
about these unknown behaviors and adapt a set of detectors adaptable error detection was inspired by the natural immune
capable of identifying them in the future. This requirement system. In the following, the main features of artificial im-
motivated us to adopt ideas from the continuous learning mune systems are outlined, and the algorithm that essentially
nature of the immune system. implements the framework for AED is described.
The framework consists of two phases, namely design-time
immunisation and run-time adaptation that are comparable
with the immune metaphors of immunisation, and continual A. Artificial Immune Systems
learning, respectively. The design-time immunisation caters Artificial immune systems (AIS) are an example of nature-
for the distribution of generic error detectors amongst systems inspired problem solving system and can be defined as adap-
from an off-line process of detector generation. To be more tive systems, inspired by theoretical immunology and observed
5

immune functions, principles and models which are applied to that was capable of continual learning, was highly adaptable
problem solving [3]. and produced a model of the system that could prove useful
AIS have demonstrated significant advantages and strengths to field engineers. Through our investigations we found the
in diverse scenarios, for example: where the inputs are often immune immune system for e-mail classification (AISEC) [2]
prone to noise and large perturbations, and the system needs to contain such properties. AISEC was developed for a two-
to recover and continue operation [21]; where there can be class problem, to discriminate between interesting and unin-
concept shifts in the input space, and the system must track teresting e-mails. However, it has properties such as continual
these [2]; where the memory structure must be stable over learning and adaptation which are required for our application
time, but able to cope with novel structures [22]; where the to ATMs. Therefore, a decision was made to investigate the
system needs to be able to generalise or adapt quickly [23]; applicability of that approach in this domain, as it has been
where, for the system to operate in real time, special hardware shown to perform as well as Baysian systems in terms of
may be necessary to cope with the load and throughput [16]; classification, but proved itself to be efficient at adapting to
where we may require the facility of self-repair and continual changes in the data space: an important requirement in our
development to take full advantage of the immune system study.
capabilities [2], [22]. For these reasons, they are proving to
be an interesting and fruitful avenue of research for fault The AISEC algorithm exploits two sets of artificial immune
tolerance. cells that display features and behaviours of natural B-cells
In an attempt to create a common basis for AIS, a frame- and T-cells. For simplicity, all the immune cells in the AISEC
work has been proposed [3]. The basic elements of that algorithm are referred to as B-cells. One set of immune cells
framework are the following: are naive, while the other set are memory cells. The AISEC
• Application domain: This includes the application data algorithm consists of training and testing phases. From the
for which an AIS is to be developed; training phase emerges B-cells that represent uninteresting e-
• Representation: Appropriate representation needs to be mails. Each B-cell is a feature vector containing words from
developed for the components of the AIS; the subject and sender fields of the corresponding uninteresting
• Affinity measure: This defines how the components of e-mail. During the testing phase of the AISEC algorithm,
the AIS will interact. It is also a determinant of how the new e-mails are classified into interesting or uninteresting.
interactions will be evaluated; These new e-mails are regarded as antigens, and are initially
• Immune algorithms: These include processes that govern processed into the same format as B-cells. Subsequently, if the
the dynamics of the system over time. In other words, the affinity between an antigen and a B-cell exceeds a threshold,
immune algorithms define the variation in the behaviour the antigen is classified as an uninteresting e-mail. The clas-
of the AIS over time. sification of an antigen as an uninteresting e-mail requires a
The AIS framework can be thought of as a layered approach feedback from a user, termed co-stimulation, to confirm the
starting from an application domain or target function. From accuracy of the classification. If the co-stimulation ascertains
this basis, the way in which the components of the system that the antigen is an uninteresting e-mail, the corresponding
will be represented will be considered. For example, the B-cell is rewarded by being promoted to a memory B-cell
representation of network traffic may well be different from that are long-lived (assuming the B-cell was not already a
the representation of a real time embedded system. Once the memory B-cell). In addition, the B-cell responsible for the
representation has been chosen, one or more affinity measures correct classification of an e-mail undergoes clonal selection
are used to quantify the interactions of the elements of the to produce variants of itself. Alternatively, an incorrect clas-
system. There are many possible affinity measures (which are sification induces the death of the B-cell responsible for the
partially dependent upon the representation adopted), such as classification, as well as other similar B-cells. In a situation
Hamming and Euclidean distances. The final layer involves that an antigen is classified as interesting, it is just passed on
the use of algorithms, which govern the behavior (dynamics) to the user. Altogether, the continuous learning feature of the
of the system. There are several algorithms, and these can AISEC algorithm is a product of the intermittent reproduction
be based on the following immune processes: negative and of B-cells, user feedback on classification of e-mails, and cell
positive selection, clonal selection, bone marrow, and immune death.
network algorithms. Work in [24] discuss the importance of
adopting a problem-oriented approach to the development of Other techniques were investigated, such as rule induction,
AIS, rather than the more ad-hoc adoption of techniques. In and we concluded that this algorithm had the simplicity
line with this, we carefully analysed the problem domain, and and properties that are required. As part of this preliminary
at all stages, we have focused on solving the problem at hand work, we undertook considerable investigations into the use
with the most suitable solution. of negative selection [1]. However, investigations revealed
that there were significant problems with the approach, such
B. An Artificial Immune System for E-mail Classification as, detector coverage, detector generation and use of discrete
(AISEC) Algorithm data sources and the need for two classes from which to
The field of Artificial Immune Systems (AIS) [3] has had learn. These problems have subsequently also been reported
success in the development of effective classification algo- in the literature, and the reader is referred to those for further
rithms [25]–[27]. For our solution, we required a technique information [28]–[30].
6

VI. A N AIS P ROTOTYPE FOR A DAPTABLE E RROR window size for generating sequences was adopted due to the
D ETECTION IN ATM S absence of information on markers that tag the beginning of
In this section we outline a prototype system that has been sequences of states. Sequences of states are fatal when they are
realised as part of the research. We initially describe the terminated by fatal states (M-Status value of 10 is a fatal state).
mapping of the framework for AED into a network of ATMs, Non-fatal sequences of states are not terminated by fatal states
then discuss issues relating to the data and immune inspired (any M-Status value apart from 10). Figure 3 shows examples
techniques employed. of the fatal and non-fatal sequences of states.
The framework for AED, outlined in the previous section, Fatal state
can be place in the context of a network of ATMs, as depicted window size (n) = 6
in figure 1. In this diagran, we see ATMs labelled ATM 1, ATM 13 11 8 35 5 4 10 Fatal sequence
2, ATM 3, and ATM 4 connected to a Central Manager, which
is able to receive and send information to connected ATMs.
As it can be observed, the framework exploits the network 4 5 8 11 12 5 13 Non−fatal sequence
infrastructure to support local and network-wide learning. A
local AED system is implemented within an ATM, while the Non−fatal state
network-wide AED is hosted by the Central Manager that
supports the information exchange amongst local AEDs. Al- Fig. 3. Illustration of fatal and non-fatal sequences
together, the connection of each ATM to the Central Manager
enables learning amongst ATMs through the network-wide The sequences illustrated by figure 3 are generated using
AED. a fixed window size of 6. The sequence terminated by a M-
In the following we proceed to describe the prototype for Status value of 10 in figure 3 is a fatal sequence of states,
AED in terms of the basic elements of the AIS framework, as while the sequence that ends with a M-Status value of 13 is
introduced in section V-A. a non-fatal sequence of states.

A. Application Domain C. Affinity Measure


The data source provided for the implementation of the local In order for a detector to identify whether a sequence of
AED were ATM log files, which capture histories of error data is a precursor to failure, some form of affinity measure
occurrences in an ATM. Each error occurrence is associated between the two is required. Given that a sequence is being
with a time stamp indicating the time of the error event. Each used, any affinity metric should take into account the number
log file records error occurrences related to different modules of states (a history) in order to make the prediction. To
in an ATM, for example, an error in the magnetic card reader this end, the most obvious choice is to adopt some form of
(MCRW) module. Each record is made up of fields that are window on the incoming sequence which allows a sequence
descriptive of an error occurrence. to be matched against a detector. If sufficient states within the
Throughout our work, we investigated alternative ways on sequence are matched against a fatal sequence detector then a
how to use the data, ranging from time stamps, M-Status, M- classification can be made whether the sequence is a precursor
Data, and combinations of some fields. After careful inves- to a fatal state.
tigation, it was concluded that the combination of a number The affinity measure employed adapts the r-contiguous bits
of fields, e.g. M-Status and time, may be beneficial, but the matching rule for the problem, which defines the affinity be-
simplest approach was adopted. Consequently, the M-Status tween two data items, when they have a number of contiguous
field was selected for the implementation. The M-Status field bits in common. Consequently, affinity between a sequence of
takes discrete values that are error codes, describing the state run-time ATM states (antigen), and an error detector (B-cell)
of an ATM at the time an error was detected. As a result of is computed by identifying the number of contiguous states
further investigations, it was discovered that the M-Status field that are common to them. An illustration is shown in figure 4,
was sufficient for identifying possible fatal states (assumed to whereby the value of r is the minimum number of contiguous
be the value of 10) of an ATM. A fatal state is indicative of states required to define affinity.
the failure of an ATM.
Sequence of ATM states (antigen)
r=4

B. Representation 8 18 8 5 4 18

The task of the AED given the ATM log files is the a 8 18 8 5 35 35 10
priori detection of fatal states that correspond to an impending
system failure. In this regard, the patterns to be detected are Error detector (B−cell)
sequences of states preceding fatal states, i.e, fatal sequences
Fig. 4. R-contiguous bits matching rule
of states. The approach monitors the sequences of ATM states
until a fatal sequence of states is detected. Hence detectors
identify exclusively fatal sequences, and not non-fatal ones. In In our prototype, affinity is calculated from the r-contiguous
order to constrain the number of states in a sequence, a fixed states common to an antigen and a B-cell. The affinity mea-
7

PROCEDURE:
sure also takes into account the proximity of the common a. Clear the contents of naiveCells and memoryCells
contiguous states present in the B-cells in relation to the fatal b. Initialise memoryCells with B-cells from
trainingBcells and initialise life-span of each memory
state. For example, in figure 4, the B-cell has two states 35, 35 B-cell. Number of B-cells inserted into memoryCells is
between contiguous states 8 18 8 5 and fatal state of 10. The equal to the memorySeed;
c. While trainingBcells is not empty:
antigen has two states 4, 18 after the contiguous states 8 18 8
5. These states that lie in between the contiguous states and i. Initialise naiveBcell with a B-cell from
trainingBcells and initialise its life-span;
the fatal state provide another factor for the affinity. Affinity ii. If the affinity of naiveBcell with a memory
is calculated as a value between 0 and 1, and it is computed B-cell is greater than the affinityThreshold, start
evolutionary process;
using equation 1 based on the following notations: iii. Evolutionary process:
affinity - variable to store affinity between antigen
iii(a). Clone naiveBcell and store clones in
and B-cell; bcellClones. Mutate each clone in bcellClones away
r-contiguousbits - contiguous bits common to from memory B-cells;
iii(b). While bcellClones is not empty:
antigen and B-cell;
windowSize - window size for generating sequences; iii(b1). Pick a clone from bcellClones;
iii(b2). If the affinity of clone with trainingBcells
abs(x) - absolute value of x; is greater than the affinity of naiveBcell with a
antigenInterval - number of states between r- memory B-cell, then add clone to naiveCells.

contiguous bits and fatal state in antigen;


b-cellInterval - number of states between r- Fig. 5. Pseudocode for the off-line process of the AISEC algorithm
contiguous bits and fatal state in B-cell;
states (which characterizes detectors) are of fixed-length, and
the identification of fatal sequences is performed by applying
r-contiguousbits an overlapping sliding window to a stream of incoming data.
affinity = (1)
windowSize + abs(antigenInterval − b-cellInterval) Empirical studies have shown that the AISEC algorithm could
not discriminate between fatal and non-fatal sequences when
D. AISEC Algorithm for Local Adaptable Error Detection sequences are generated through an overlapping mode [1]. The
Since the AISEC algorithm cannot be applied directly to reason being that a fatal sequence and a preceding non-fatal
adaptable error detection (AED), the algorithm had to be re- sequence have last n-1 states in common, where n is the length
designed according to the new application [24]. The modified of each sequence.
AISEC algorithm for AED has two phases. The training phase, 1) Training Phase of AISEC Algorithm: During the off-line
during which error detectors are generated off-line, based on phase of the AISEC algorithm, M-Status sequences terminated
data obtained from error logs produced by an ATM. The in a fatal state are generated from ATM the log files. These
testing phase, which corresponds to the online execution of sequences will be used as a basis for generating the error
the AISEC algorithm, during which errors are identified based detectors.
on detectors previously generated. These these detectors are The pseudocode in figure 5 outlines the off-line phase of
adapted according to the learning capabilities of the AISEC the AISEC algorithm. The AISEC algorithm is trained with
algorithm. In the following sections, these two phases are B-cells (trainingBcells), which were obtained from M-Status
presented in more detail. For both phases, the sequences of sequences terminated by fatal states generated from ATM the
states (which characterizes detectors) are of fixed-length, and log files. First, the memory set (memoryCells) is initialised
the identification of fatal sequences is performed by applying with a subset of the B-cells in trainingBcells. The number
an overlapping sliding window to a stream of incoming data. of B-cells used to initialise the memory set is limited to
Empirical studies have shown that the AISEC algorithm could the memory seed (memorySeed). Then, the remaining B-cells
not discriminate between fatal and non-fatal sequences when undergo cloning and mutation to produce a diverse set of B-
sequences are generated through an overlapping mode [1]. The cells that are introduced into the naive set (naiveCells).
reason being that a fatal sequence and a preceding non-fatal The outcome of this process is a set of generic error
sequence have last n-1 states in common, where n is the length detectors that are used to immunise the local AED for the
of each sequence. classification of potential failure sequences.
Since the AISEC algorithm cannot be applied directly 2) Testing Phase of AISEC Algorithm: The online phase
to adaptable error detection (AED), the algorithm had to of the AISEC algorithm for AED caters for the error de-
be redesigned according to the new application [24]. The tection, learning, local tolerisation, validation and evaluation
modified AISEC algorithm for AED has two phases. The processes. It involves classifying each fixed-length sequence of
training phase, during which error detectors are generated states presented to the algorithm and reacting to the feedback
off-line. The testing phase, which corresponds to the online on each classification. The classification process is outlined by
execution of the AISEC algorithm, during which errors are the pseudocode in figure 6. An antigen or sequence of states
identified based on detectors previously generated, and these denoted as sequence, is compared with the naive and memory
detectors are adapted according to the learning capabilities of B-cells termed naiveCells and memoryCells, respectively. If
the AISEC algorithm. In the following, these two phases are sequence matches a B-cell in naiveCells or memoryCells, it
presented in more detail. For both phases, the sequences of is classified as fatal. Otherwise, sequence is considered to be
8

PROCEDURE:
a. While all B-cells in naiveCells and memoryCells
tyThreshold), has its life span increased by the value of the
have not been selected: stimulation count for naive B-cells. In addition, such naive B-
i. Pick a B-cell from naiveCells or memoryCells, and
cells in naiveCells are cloned and mutated, and the clones are
store in b-CellA; added to naiveCells. Furthermore, bCellBest is initialised with
ii. If the affinity between b-CellA and sequence is
greater than classificationThreshold then return
a naive B-cell from naiveCells, with the highest affinity for
sequence is fatal. Otherwise, return sequence is not sequence. mCellBest is also initialised with a memory B-cell
fatal.
from memoryCells, having the highest affinity for sequence.
If the affinity between bCellBest and sequence, exceeds the
Fig. 6. Pseudocode for the classification of a sequence by the AISEC
algorithm affinity between mCellBest and sequence, bCellBest is pro-
moted to become a memory B-cell. mCellBest is not removed
PROCEDURE: from the memory set at this stage. Memory cells are only
a. If co-stimulation on classification of sequence is removed once the life span indicator has fallen below a certain
positive:
i. While all B-cells in naiveCells have not been threshold. The promotion entails intialising the life span of the
selected: new memory B-cell with the stimulation count for memory
i(a). Initialise b-CellA with a B-cell from B-cells. In a situation that sequence is classified as fatal and
naiveCells; the co-stimulation disproves of the classification, all naive B-
i(b). If the affinity between b-CellA and sequence
is greater than affinityThreshold, then increase cells whose affinity for sequence exceed affinityThreshold, are
life-span of b-CellA; removed from naiveCells. The same applies to the memory
ii. Initialise bCellBest with B-cell in naiveCells B-cells in memoryCells. After presenting a sequence for
having highest affinity for sequence; classification, the life span of all B-cells in naiveCells are
iii. Clone and mutate bCellBest using sequence
as reference B-cell, and add mutated clones to decremented. Each presentation of a sequence of states for
naiveCells; classification, corresponds to an iteration of the algorithm.
iv. Re-initialise bCellBest with B-cell in naiveCells
having highest affinity for sequence; The AISEC algorithm for AED employs generalisation and
v. Initialise mCellBest with B-cell in memoryCells specialisation of B-cells as its mutation mechanism. Generali-
having highest affinity for sequence;
vi. If affinity between bCellBest and sequence is sation substitutes a valid state in a B-cell with a don’t care (*),
greater than affinity between mCellBest and sequence, while specialisation substitutes a state with another valid state
then:
in the gene library. The gene library constrains the algorithm
vi(a). Remove bCellBest from naiveCells; to mutating with only valid states. In addition to this, new
vi(b). Initialise life-span of bCellBest to the
initial value of life-span for memory B-cells; B-cells can be introduced into the detector set through the in-
vi(c). Promote bCellBest by adding to memoryCells; corporation of undetected fatal sequences. Unlike cloning and
vi(d). Decrement the life-span of all memory B-cells
in memoryCells, whose affinity for bCellBest is mutation, which are characterised by guided random process,
greater than affinityThreshold; the incorporation of fatal sequences allows for learning about
b. If co-stimulation on classification of sequence is specific failure sequences, that is if they occur again, they will
negative: be detected.
i. Remove all B-cells in naiveCells and memoryCells,
whose affinity for sequence is greater than
affinityThreshold; VII. E XPERIMENTAL S ETUP
c. Decrease the life-span of all B-cells in
For evaluating the classification performance of the AISEC
naiveCells. algorithm two data sets were used. Data set ATM-data-set-
Fig. 7. Pseudocode for the online phase of the AISEC algorithm A was derived from the concatenation of pre-processed ATM
log files generated by different ATMs located in a common
geographical area. The same applies to the data in ATM-
non-fatal. The sequence matches a B-cell when their affinity data-set-B, which were obtained from a geographical area
exceeds the classification threshold (classificationThreshold). different from that of ATM-data-set-A. Even though, data from
¿From the outcome of the classification process, a positive, a single ATM should have been used for this purpose, the
negative or nil response may be generated by the AISEC insufficiency of data required the concatenating of data from
algorithm. No response is generated by the AISEC algorithm different ATMs. It was assumed that geographically located
when a sequence is classified as non-fatal. The situation ATMs provided a common data set for experimentation. An
is different when the classification infers that a sequence initial investigation confirmed that this was not the case for
is fatal, whereby the AISEC algorithm reacts positively or non-geographically located ATMs.
negatively depending on the co-stimulation. A co-stimulation Prior to the detector generation process, the ATM log
that approves of a classification incites a positive reaction. The files are initially pre-processed. Data sets ATM-data-set-A and
rd
converse is the case with a rebuttal from the co-stimulation. ATM-data-set-B were divided into three separate parts. 23 of
rd
The pseudocode in figure 7 outlines the reaction of the each data set was for training, while the remaining 13 of each
AISEC algorithm when a sequence is classified as fatal. A data set was divided into halves for validation and testing. The
sequence classified as fatal and confirmed to be correct by the objective in partitioning the data into training, validation and
co-stimulation leads to the evolutionary process. Every naive testing is to obtain an accurate and unbiased measure of the
B-cell termed naiveCells, whose affinity with the classified classification performance from experiments [31]. Records in
sequence (sequence) exceeds the affinity threshold (affini- each partition of the data sets are shown in table I.
9

TABLE I
• Online AED with evolution: This is the full ASIEC algo-
DATA SETS IN THE TRAINING , VALIDATION AND TESTING PARTITIONS OF
rithm, which includes online feedback on classification,
ATM DATA FOR EVALUATING THE CLASSIFICATION PERFORMANCE OF
online evolutionary process through cloning and muta-
THE LOCAL AED.
tion, and off-line evolutionary process during training of
naive B-cells;
Data set Training Validation Testing
ATM-data-set-A 3059 764 765 • Online AED with incorporation of fatal sequences: This
ATM-data-set-B 6004 1501 1501 is the full AISEC algorithm but instead of the evolution-
ary process by cloning and mutation, new B-cells are
TABLE II recruited into the naive pool by incorporating undetected
DATA SETS IN THE TRAINING AND TESTING PARTITIONS OF ATM DATA fatal sequences.
FOR EVALUATING THE MEAN DETECTION TIME INTERVAL OF THE LOCAL
AED. A. Classification Performance

Main data sets Sub data sets Records


The results from the experiments using ATM-data-set-A are
ATM-data-set-A ATM-data-set-A’ (Training) 4205 shown in table III, and ATM-data-set-B in table IV. Column
ATM-data-set-A” (Testing) 383 (a) represents results from executing the static AED, column
ATM-data-set-B ATM-data-set-B’ (Training) 5921 (b) shows results from static AED with evolution, column (c)
ATM-data-set-B” (Testing) 3085
is for online AED with evolution, and column (d) provides
the outcomes from online AED with incorporation of fatal
sequences. The standard deviations are shown in brackets.
For evaluating the mean detection time interval of the As can be seen from table III, in terms of classification
AISEC , the algorithm was trained with data sets ATM-data- accuracy (i.e., how well it predicts a fatal sequence) the local
set-A’ and ATM-data-set-B’. The detectors generated from AED is consistently high. Also it shows that the static AISEC
data set ATM-data-set-A’, were tested using data set ATM- generated a higher classification accuracy and true positive
data-set-A”. Data set ATM-data-set-A” corresponds to a single rate than the static AISEC with evolutionary process, based
log file in ATM-data-set-A, while the remaining concatenated on the Z-statistic at 0.05 significant level [Z = 5.06 "
log files in ATM-data-set-A make up ATM-data-set-A’. The Z0.05 = ±1.96]. A likely reason for this outcome is that
same applies to data sets ATM-data-set-B’ and ATM-data-set- the single attempt at exploiting the evolutionary mechanism
B”, which are subsets of ATM-data-set-B. It was assumed during the training of the static AISEC with evolutionary
that the training data from ATM-data-set-A’ were generic to process, did not generate B-cells that were useful for detecting
log files in the relevant geographical region, such that the fatal sequences of states. What is very encouraging is the
detectors generated were adequate for detecting errors in an low rate of false positives (i.e. how many times the AED
ATM, i.e., ATM-data-set-A”. The same assumption holds for system said there was a potential failure, when there was
the experiments with ATM-data-set-B’ and ATM-data-set-B”. not). This figure should be as low as possible, as a high
Table II presents the number of records in the training data sets false positive rate is the same as a high false alarm rate.
ATM-data-set-A’ and ATM-data-set-B’, as well as the testing However, what should be noted is that there is very little
data sets ATM-data-set-A” and ATM-data-set-B”. difference between the classification performance of all four
All experiments were repeated for 30 independent runs, variants. This is attributed to the relatively small amount of
and the average taken. AISEC has a number of parameters, data that was available. Data employed in these experiments
for more detail on these see [2]. We undertook an extensive come from real ATMs in operation, and retrieval of this data
analysis of the parameter space, and the parameters used in (at present) is difficult. This restriction has not enabled the
our experiments were determined empirically, and are detailed immune mechanism within the AED with evolution sufficient
in the table caption. For detailed information please refer to experience and time to improve.
[1]. ¿From the second data set ATM-data-set-B, the results in
table IV indicate that the classification accuracy, true positive
VIII. R ESULTS AND A NALYSIS and false positive rates of the static AISEC are identical to
Experiments were carried out using four variants of the that of the static AISEC with evolutionary process. The reason
AISEC algorithm to understand the role of the off-line and for this is that B-cells generated from the training data were
online processes, as well as the different evolutionary mecha- similar since the data set ATM-data-set-B is highly repetitive.
nisms. The variants of the local AED include: In contrast with table III, in table IV the static and online
• Static AED: This is the AISEC algorithm, without off- AISEC variants display higher false positive rates. This may
line evolutionary process during training of naive B-cells, have originated from the use of the repetitive data set ATM-
without online feedback on classification, and without data-set-B, such that the fatal and non-fatal sequences of states
online evolutionary process; that were generated displayed some similarities. The lower
• Static AED with evolution: This is the AISEC algorithm, false positive rates generated by the online AISEC variants in
which includes the off-line evolution of naive B-cells, table IV, when compared with the static AISEC variants, is as
but without online feedback on classification, and without a result of the continuous learning that purged false positive
online evolutionary process; B-cells.
10

TABLE III
C OMPARISON OF CLASSIFICATION PERFORMANCES OF THE VARIANTS OF THE AISEC ALGORITHM USING TRAINING AND TESTING DATA FROM
ATM-data-set-A. PARAMETERS INCLUDE : WINDOW SIZE = 6, CLASSIFICATION THRESHOLD = 0.98, AFFINITY THRESHOLD = 0.95, MEMORY SEED = 65,
CLONE CONSTANT = 7, STIMULATION COUNT ( NAIVE ) = 25, STIMULATION COUNT ( MEMORY ) = 15, TRAIN DATA = 68 DETECTORS , TEST DATA = 38 (24
FATAL SEQUENCES AND 14 NON - FATAL SEQUENCES ).

Static AISEC Static AISEC Online AISEC Online AISEC


evolutionary process evolutionary process incorporation of fatal sequences
(a) (b) (c) (d)
Classification accuracy 94.74% (0.00) 92.46% (2.47) 92.46% (2.83) 92.98% (1.74)
True positive 91.67% (0.00) 88.06% (3.90) 88.06% (4.48) 88.89% (2.75)
True negative 100.00% (0.00) 100.00% (0.00) 100.00% (0.00) 100.00% (0.00)
False positive 0.00% (0.00) 0.00% (0.00) 0.00% (0.00) 0.00% (0.00)
False negative 8.33% (0.00) 11.94% (3.90) 11.94% (4.48) 11.11% (2.75)
Naive detectors 3.00 (0.00) 0.80 (0.71) 47.00 (4.68) 2.27 (0.45)
Memory detectors 65.00 (0.00) 65.00 (0.00) 65.00 (0.00) 65.40(0.62)

TABLE IV
C OMPARISON OF CLASSIFICATION PERFORMANCES OF THE VARIANTS OF THE AISEC ALGORITHM USING TRAINING AND TESTING DATA FROM
ATM-data-set-B. PARAMETERS INCLUDE : WINDOW SIZE = 14, CLASSIFICATION THRESHOLD = 0.98, AFFINITY THRESHOLD = 0.95, MEMORY SEED = 50,
CLONE CONSTANT = 7, STIMULATION COUNT ( NAIVE ) = 25, STIMULATION COUNT ( MEMORY ) = 15, TRAIN DATA = 55 DETECTORS , TEST DATA = 89 (17
FATAL SEQUENCES AND 72 NON - FATAL SEQUENCES ).

Static AISEC Static AISEC Online AISEC Online AISEC


evolutionary process evolutionary process incorporation of fatal sequences
(a) (b) (c) (d)
Classification accuracy 86.52% (0.00) 86.52% (0.00) 89.89% (0.00) 89.85% (0.21)
True positive 52.94% (0.00) 52.94% (0.00) 52.94% (0.00) 52.75% (1.07)
True negative 94.44% (0.00) 94.44% (0.00) 98.61% (0.00) 98.61% (0.00)
False positive 5.56% (0.00) 5.56% (0.00) 1.39% (0.00) 1.39% (0.00)
False negative 47.06% (0.00) 47.06% (0.00) 47.06% (0.00) 47.25% (1.07)
Naive detectors 5.00 (0.00) 1.53 (0.82) 17.53 (11.25) 9.37 (0.85)
Memory detectors 50.00 (0.00) 50.00 (0.00) 46.30 (0.60) 46.10 (0.31)

Figures 8 and 9, respectively, from the data sets ATM-data- the classification performance of the online AISEC algorithm
set-A and ATM-data-set-B show the changes to the classifica- improves with the recruitment of new error detectors. There-
tion accuracy of the online AISEC variants. The experiments fore, experiments were performed to determine whether the
relating to the online AISEC with evolutionary process are online AISEC with evolutionary process and the online AISEC
labelled as ‘evolutionary process’ in the graphs. The label with incorporation of fatal sequences will improve with the
‘incorporation of fatal sequences’ in the graphs relates to the learning of new error detectors. For these experiments, the
results from the online AISEC with incorporation of fatal testing data was passed twice through the AISEC algorithm
sequences. Figure 8 shows that during the online execution, to demonstrate whether the continuous learning of new B-cells
the presentation of at least five testing sequences of states leads will improve the classification performance of the algorithm.
to almost 100.00% classification accuracy of the AISEC algo- The experiments have shown that through the learning of
rithm. Figure 9 displays 100.00% classification accuracy for at effective B-cells, the classification accuracy rises progres-
least five testing sequences of states presented to the AISEC sively. After a period of reinforcing effective B-cells and
algorithm. Subsequently, misclassifications of fatal and non- removing ineffective ones, the classification accuracy stabilises
fatal sequences of states decrease the classification accuracy. to indicate that the AISEC algorithm may have potentially
However, in the long term, the classification accuracy becomes reached an equilibrium.
stable as depicted by figure 9, indicating that the AISEC
might have reached an equilibrium. Furthermore, figures 8
and 9 show that the classification accuracies of the online B. Mean Detection Time Interval
AISEC with evolutionary process and the online AISEC with
The other criterion for evaluating the AISEC algorithm is
incorporation of fatal sequences are overlapping, which is
the impact of the local AED on the availability of ATMs.
more pronounced in figure 9. This confirms that the outcomes
The availability of ATMs could not be estimated from the
of the online AISEC variants are comparable as observed in
usual equation due to the lack of information regarding the
the tabular results.
time an ATM resumes its operations after a fatal state (or
Although previous experiments have demonstrated that the a failure). Instead, the mean detection time interval was
AISEC algorithm applied to AED augments error detectors by computed to infer how the adaptable error detectors would
the continuous learning feature, they do not confirm whether affect the availability of ATMs.
11

100
This mean value was computed from the time stamps
associated with each state, which were extracted from the ATM
80 data used for the experiments. It is assumed that the mean
detection time interval provides early detection of imminent
Classification accuracy (%)

failures to trigger the actions for avoiding system downtime,


60
which increases the mean time to failure (MTTF) of a system.
In the event that the failure event could not be prevented, the
40 early detection reduces the mean time to repair (MTTR) a
system.
20
For the experiments, the AISEC algorithm was trained
using ATM-data-set-A’ and tested with ATM-data-set-A”, for
assessing the local AED in an ATM located in a geographical
0
0 5 10 15 20 25 30 35
region (data in ATM-data-set-A” are specific to an ATM). The
Number of sequences training data, i.e. data set ATM-data-set-A’, was constructed
Evolutionary process Incorporation of fatal sequences
from ATMs (apart from the testing ATM ATM-data-set-A”)
Fig. 8. Changes to classification accuracy of the AISEC algorithm. Data set co-located in a similar geographical region. The same applies
applied is ATM-data-set-A. Parameters include: window size = 6, classification to ATM-data-set-B’ and ATM-data-set-B” (see [1] for more
threshold = 0.98, affinity threshold = 0.95, memory seed = 65, clone constant = details on these data sets). The results of the experiments are
7, stimulation count (naive) = 25, stimulation count (memory) = 15, train data
= 68 detectors, test data = 38 (24 fatal sequences and 14 non-fatal sequences). shown in tables V and VI, which should be read using the for-
mat days:hours:minutes:seconds, and the standard deviations
are shown in brackets.
100
¿From table V, it is observed that the AISEC algorithm
detected approximately 87.00% of the fatal sequences of states
80 in an ATM, at an average of 12 hours before the occurrences
of the failures.
Classification accuracy (%)

Also in table VI, the AISEC algorithm detected an approx-


60
imate 90.00% of the fatal sequences in an ATM, at an average
of 2 hours prior to the failures.
40 These results in tables V and VI support the inference that
the immune-inspired adaptable error detection, which has been
20
evaluated in this work, is capable of detecting fatal sequences
of states ahead of the occurrences of the actual fatal state.
The earliness of the detection is determined by the window
0
0 10 20 30 40 50 60 70 80
size, the classification threshold and the time lag between the
Number of sequences indicators of a fatal state and the occurrence of the fatal state.
Evolutionary process Incorporation of fatal sequences
For example in table V, the mean detection time interval of
Fig. 9. Changes to classification accuracy of the AISEC algorithm. Data set approximately 12 hours mirrors the frequency of failures in the
applied is ATM-data-set-B. Parameters include: window size = 14, classifi- corresponding ATM. Likewise, the results in table VI. This
cation threshold = 0.98, affinity threshold = 0.95, memory seed = 50, clone implies that the mean detection time interval of an AED in
constant = 7, stimulation count (naive) = 25, stimulation count (memory) = 15,
train data = 55 detectors, test data = 89 (17 fatal sequences and 72 non-fatal an ATM that fails less frequently, is likely to be longer than
sequences). another that fails more often.
By demonstrating that the immune-inspired AED provides
early warning on future failures, and given the supposition that
the time interval prior to a failure is adequate for initiating
The detection time interval is the difference between the
timely actions to avoid or reduce system downtime, this work
time a fatal sequence of states is detected, and the time the
has shown that an implemented AED can lead to improved
consequent fatal state occurs. By averaging the differences
availability of ATMs.
over the frequency of true positive detection it generates, the
mean value as calculated with equation 2. The mean detection
time interval is calculated from the difference between the C. Discussion of Results
time a system is expected to end at a fatal state (tf ) and time
¿From the experiments performed, it has been concluded
of correct error detection (tcorrect ), which is summed over
that the AISEC algorithm detects fatal sequences of states prior
the total number of times an error is correctly detected. The
to the occurrences of failures, since the experiments performed
summed value is then divided by the frequency of correct error
using adequate training data and appropriate parameters. The
detection (Ncorrect ).
results have showed that during the off-line training phase, the
classification performance of the static AISEC algorithm with
Σ(tf − tcorrect ) the evolutionary process was not better than the static AISEC
Mean detection time interval = (2) algorithm without the evolutionary process. For the online
Ncorrect
12

TABLE V
M EAN DETECTION TIME INTERVAL OF THE AISEC ALGORITHM VARIANTS , WHEN TRAINED WITH DATA SET ATM-data-set-A’ AND TESTED WITH DATA
SET ATM-data-set-A”. PARAMETERS INCLUDE : WINDOW SIZE = 6, CLASSIFICATION THRESHOLD = 0.98, AFFINITY THRESHOLD = 0.95, MEMORY SEED =
65, CLONE CONSTANT = 7, STIMULATION COUNT ( NAIVE ) = 25, STIMULATION COUNT ( MEMORY ) = 15, TRAIN DATA = 99 DETECTORS , TEST DATA = 15
(11 FATAL SEQUENCES AND 4 NON - FATAL SEQUENCES ).

Static AISEC Static AISEC Online AISEC Online AISEC


evolutionary process evolutionary process incorporation of fatal sequences
(a) (b) (c) (d)
Classification accuracy 93.33% (0.00) 83.11% (6.72) 85.78% (6.00) 86.67% (5.25)
Mean detection time interval 0:12:1:30 (0:0:0:0) 0:11:47:47 (0:5:35:7) 0:11:21:22 (0:5:20:16) 0:12:31:10 (0:3:36:37)

TABLE VI
M EAN DETECTION TIME INTERVAL OF THE AISEC ALGORITHM VARIANTS , WHEN TRAINED WITH DATA SET ATM-data-set-B’ AND TESTED WITH DATA
SET ATM-data-set-B”. PARAMETERS INCLUDE : WINDOW SIZE = 14, CLASSIFICATION THRESHOLD = 0.98, AFFINITY THRESHOLD = 0.95, MEMORY SEED
= 50, CLONE CONSTANT = 7, STIMULATION COUNT ( NAIVE ) = 25, STIMULATION COUNT ( MEMORY ) = 15, TRAIN DATA = 53 DETECTORS , TEST DATA =
190 (28 FATAL SEQUENCES AND 162 NON - FATAL SEQUENCES ).

Static AISEC Static AISEC Online AISEC Online AISEC


evolutionary process evolutionary process incorporation of fatal sequences
(a) (b) (c) (d)
Classification accuracy 90.53% (0.00) 90.53% (0.28) 89.93% (0.23) 91.53% (0.16)
Mean detection time interval 0:2:18:14 (0:0:0:0) 0:2:18:33 (0:0:7:12) 0:1:3:30 (0:0:9:35) 0:2:25:41 (0:0:6:16)

testing phase of the AISEC algorithm, it was confirmed that that online AISEC with incorporation of fatal sequences gen-
the incorporation of novel fatal sequences of states, improved erated better classification performance than the online AISEC
the classification performance of the AISEC algorithm, in with evolutionary process. This is attributed to the fact that
contrast to the exploratory evolutionary process. This led to the incorporation of novel B-cells, representative of novel fatal
the conclusion that the continuous learning used by the AED sequences of states, are more inclined at producing effective B-
is associated with an improvement in its detection capability. cells. On the other hand, the random changes through cloning
In this case, the continuous learning of new errors is through and mutation did not generate effective B-cells, which might
the incorporation of novel fatal sequences of states, but it was require longer time to be effective at producing useful B-cells.
inferred that the online evolutionary process might need longer This leads to the question of whether an alternative way of
time to influence a significant improvement in the classification mutating the B-cells could be more effective. A suggestion
performance of the AISEC algorithm. It was also observed is to select subsequences within a sequence of states for
that the AISEC algorithm generated minimal false alarm rates. mutation, which is then mutated with a valid subsequence from
In addition, the continuous learning feature influenced the the gene library [32].
reduction in the false positive rate by enabling ineffective B-
cells to be purged from the naive and memory sets.
IX. C ONCLUSIONS
Furthermore, the measure of time interval between the de-
tection of impending fatal states and their occurrences, provide We have proposed a framework for improving the availabil-
the basis for stating that AED is useful for enhancing the ity of ATMs in which a key component is a local adaptable
availability of ATMs, by circumventing the fatal occurrences error detection (AED) that was implemented by adapting
or quickening their repair. Therefore, an early detection of the artificial immune system for email classification (AISEC)
errors may reduce the mean time to repair (MTTR) of the algorithm. The effectiveness of the local AED was established
target system or increase the mean time to failure (MTTF). using data that correspond to the error incidences in the cash-
The results of the experiments also show that there is a handler module of an ATM. Results from the empirical studies
trade-off between the population of training data and the has confirmed the efficacy of the local AED at forecasting
classification threshold parameter. With more training data, a system failures. A summary of the findings are presented
high classification threshold may be used to achieve high true below:
positive and low false positive rates. However, a small number • Detection of failure occurrences: The classification per-
of training data constrains the classification threshold to be formance was derived from the classification accuracy of
lowered, in order to successfully recognise fatal sequences of the AISEC algorithm (employed in the local AED), which
states. The outcome is an increased false positive rate of the is simply the number of failure occurrences detected out
AISEC algorithm. Another point worthy of note is that the of the total number of failure occurrences reported in the
early detection of imminent failures by an AED is influenced system. From the outcome, classification accuracies of
by the frequency of failures in an ATM. approximately 90% were recorded for one of the data
Another issue that was observed from the experiments was sets;
13

• Enhancement of availability: The local AED was assessed [2] A. Secker, A. Freitas, and J. Timmis, “Aisec: An artificial immune
with regards to its ability to detect potential failures in system for e-mail classification,” in Proceedings of the Congress on
Evolutionary Computation, R. Sarker, R. Reynolds, H. Abbass, T. Kay-
the system before their occurrences. Through the early Chen, R. McKay, D. Essam, and T. Gedeon, Eds. Canberra, Australia:
detection of failures, it is assumed that necessary repair IEEE, December 2003b, pp. 131–139.
actions could be undertaken to prevent system downtime. [3] L. N. de Castro and J. Timmis, Artificial Immune Systems: A New
Computational Intelligence Approach. Springer-Verlag, 2002.
In other words, the mean time to failure of the system [4] A. Avižienis, “Towards systematic design of fault-tolerant systems,”
can be increased, and mean time to repair can be reduced. IEEE Computer, vol. 30, no. 4, pp. 51–58, April 1997.
Based on this criterion, the time intervals between the [5] S. Xanthakis, S. Karapoulios, R. Pajot, and A. Rozz, “Immune system
and fault tolerant computing,” Lecture Notes in Computer Science, vol.
detection and occurrences of failures were monitored. 1063, pp. 181–197, 1996.
From the estimated mean of these time intervals based on [6] D. W. Bradley and A. M. Tyrell, “A hardware immune system for
a particular data set, it was demonstrated that the local benchmark state machine error detection,” in Proceedings of Congress
on Evolutionary Computation. Part of the World Congress on Compu-
AED detected failure occurrences on an average of 12 tational Intelligence., Honolulu, HI, USA, 2002, pp. 813–818.
hours for one data set, and 2 hours for a second data [7] A. Avižienis, J. Laprie, B. Randell, and C. Landwehr, “Basic concepts
set. These mean time intervals are absolute values, but and taxonomy of dependable and secure computing,” IEEE Transactions
on Dependable and Secure Computing, vol. 1, no. 1, pp. 11–33, January
their significance for carrying out repairs to circumvent 2004.
failures are dependent on domain expert’s opinion. [8] P. Jalote, Fault Tolerance in Distributed Systems. Upper Saddle River,
NJ, USA.: Prentice-Hall, Inc, 1998.
One of the limitations of the proposed approach is that [9] C. Scherrer and A. Steininger, “Dealing with dormant faults in an
it does not cater for rare events that may be associated embedded fault-tolerant computer systems,” IEEE Transactions on Re-
with elusive faults, some of which can be every difficult liability, vol. 52, no. 4, pp. 512–522, December 2003.
[10] Y. Ishida, “Active diagnosis by self-organization: An approach by the
to reproduce. The reason for this is that the algorithm only immune network metaphor,” in Proceedings of the International Joint
stores as error detectors those recurrent system states that are Conference on Artificial Intelligence (IJCAI’97), Nagoya, Japan, 1997,
precursors of system failure. If it were otherwise, there would pp. 1084–1089.
[11] F. Mizessyn and Y. Ishida, “Immune networks for cement plants,” in
be an explosion of error detectors and an undesirable increase Proceedings of International Symposium on Autonomous Decentralised
of false positives that might hinder the overall performance Systems, 1993, pp. 282–288.
of the local AED. An alternative approach would be to have [12] A. Ishiguro, W. Yuji, and Y. Uchikawa, “Fault diagnisis of plant
systems using immune networks,” in Proceedings of the 1994 IEEE
a separate system that monitors and identifies potential rare International Conference on Multisensor Fusion and Integration For
events. The incorporation of the correspondent error detectors Intelligent Systems (MFI’94), Las Vegas, NE, USA, October 2-5 1994,
into the main pool of detectors would be performed by pp. 34–42.
[13] D. W. Bradley and A. M. Tyrell, “Immunotronics: Hardware fault
maintenance personnel. tolerance inspired by the immune system,” in Proceedings of Third
As future work, the aim is to implement network-wide International Conference on Evolvable Systems (ICES 2000), vol. 1801,
April 2000, pp. 11–20.
adaptable error detection, as initially foreseen in the frame- [14] ——, “The architecture for the hardware immune system,” in Pro-
work. For that it is necessary to specify the protocol for ceedings of The Third NASA-DoD workshop on Evolvable Hardware,
exchanging information between the local AEDs and the D. Keymeulen, A. Stoica, J. Lohn, and R. S. Zebulum, Eds. Long
Beach, CA, USA: IEEE Computer Society, July 2001, pp. 193–200.
network-wide AED, and determine the minimum number of [15] D. W. Bradley and A. Tyrell, “Hardware fault tolerance: An immuno-
local AEDs that must propagate a novel error detector to logical solution.” IEEE, 2000, pp. 107–112.
the network-wide AED, before the detector is considered [16] D. Bradley, C. Ortega, and A. Tyrell, “Embryonics + immunotronics:
A bio-inspired approach to fault tolerance,” in Proceedings of 2nd
generic to the connected ATMs. Another area of research NASA/DoD Workshop on Evolvable Hardware, J. Lohn and et al, Eds.
is to investigate the possibility of generating variable-length IEEE Computer Society, July 2000, pp. 215–233.
error detectors. Error detectors in the implemented framework [17] D. W. Bradley and A. M. Tyrell, “Immunotronics - novel finite-state-
machine architectures with built-in self-test using self-nonself differen-
were represented as fixed length sequences of states that were tiation,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 3,
terminated by fatal states. The motivation for constraining pp. 227–233, June 2002.
the sequences to fixed lengths was due to a lack of domain [18] P. D’haeseleer, “An immunological approach to change detection: Theo-
retical results,” The University of New Mexico, Albuquerque, NM, USA,
knowledge on the markers that tag the beginning of sequences. Tech. Rep., 1996.
In a situation that the markers are known, it may be neces- [19] P. D’haeseleer, S. Forrest, and P. Helman, “An immunological approach
sary to investigate the feasibility of exploiting variable-length to change detection: Algorithms, analysis and implications,” in Proceed-
ings of IEEE Symposium on Research in Security and Privacy, Oakland,
sequences, error detectors and window sizes. Above all, we CA, USA, May 1996.
would like to continue to explore other immune-based ideas. [20] B. Hérve, “A brief history of the prevention of infectious diseases by
The framework for AED was inspired by the immunisation immunisations,” Comparative Immunology, Microbiology, and Infectious
Diseases, vol. 26, no. 5, pp. 293–308, 2003.
and adaptability concepts taken from the immune system, [21] J. Kim and P. Bentley, “Towards an artificial immune system for network
thus other immune-inspired ideas may be explored for error intrusion detection: An investigation of dynamic clonal selection,” in
detection, such as, the danger theory [33] and the self-assertion Proceedings of The Congress on Evolutionary Computing (CEC-2002),
Honolulu, HI, USA, May 2002, pp. 1015–1020.
view of the immune system [34]. [22] M. Neal, “Meta-stable memory in an artificial immune network,” in
Proceedings of the 2nd International Conference on Artificial Immune
Systems (ICARIS), J. Timmis, P. J. Bentley, and E. Hart, Eds. Edinburgh,
R EFERENCES UK: Springer, September 2003, pp. 168–180.
[23] J. Kelsey and J. Timmis, “Immune inspired somatic contiguous hyper-
[1] M. Ayara, “An immune-inspired solution for adaptable error detection mutation for function optimisation,” in Proceedings of the Genetic and
in embedded systems,” Ph.D. dissertation, Computing Laboratory, Uni- Evolutionary Computation (GECCO 2003), E. Cant-Paz and et al, Eds.
versity of Kent, UK, September 2005. Chicago, IL, USA: Springer-Verlag, July 12-16 2003, pp. 207–218.
14

Error detectors
[24] A. A. Freitas and J. Timmis, “Revisiting the foundations of artificial
*** * * 12 10
immune systems: A problem-oriented perspective,” in Proceedings of the
*** * * 13 10
2nd International Conference on Artificial Immune Systems (ICARIS),
*** * * 5 10
P. J. B. J. Timmis and E. Hart, Eds. Edinburgh, UK: Springer,
September 2003, pp. 229–241. *** * * 35 10
[25] D. Goodman, L. Boggess, and A. Watkins, “Artificial immune system *** * * 18 10
classification of multiple-class problems,” in Proc. of Intelligent Engi- TABLE VII
neering Systems. ASME, 2002, pp. 179–184.
[26] J. Kim and P. Bentley, “Immune memory in the dynamic clonal selection E RROR DETECTORS GENERATED FROM THE C5.0 ALGORITHM

algorithm,” in Proceedings of 1st International Conference on Artificial


Immune Systems (ICARIS), J. Timmis and P. J. Bentley, Eds. University
of Kent at Canterbury Printing Unit., September 2002b, pp. 59–67.
[27] A. Watkins, J. Timmis, and L. Boggess, “Artificial Immune
Recognition System (AIRS): An Immune Inspired Supervised Machine 11 to 20. Table VIII shows the results from applying longer
Learning Algorithm,” Genetic Programming and Evolvable Machines,
vol. 5, no. 3, pp. 291–318, September 2004. [Online]. Available: window sizes. Column (b) in table VIII lists the common
http://www.cs.kent.ac.uk/pubs/2004/1634 detectors that were generated across multiple window sizes
[28] M. Ayara, J. Timmis, R. de Lemos, L. N. de Castro, and R. Duncan, and the actual length can be derived by padding a detector
“How to generate detectors,” in Proceedings of 1st International Confer-
ence on Artificial Immune Systems (ICARIS), J. Timmis and P. J. Bentley, with leading don’t cares (*). The detectors in column (c) of
Eds. Canterbury, UK: University of Kent at Canterbury Printing Unit., table VIII are specific to a window size.
September 2002, pp. 89–98.
[29] T. Stibor, K. Bayarou, and C. Eckert, “An investigation of R-chunk
detector generation on higher alphabets,” in LNCS 3102, 2004, pp. 26– Window Common Other
30. size detectors detectors
[30] J. Timmis, R. de Lemos, M. Ayara, and R. Duncan, “Towards immune (a) (b) (c)
inspired fault tolerance in embedded systems,” in Proceedings of 9th 11 35 10 * * * * * * 8 * 18 * 18 10
International Conference on Neural Information Processing (ICONIP), 12 10 * * * * * * 12 * * * * 10
Singapore, November 2002, pp. 1459–1463. 13 10 * * * * * * 5 * * * * 10
[31] S. M. Weiss and K. A. Casimir, Computer Systems that Learn: Classi- 18 10 * * * * * * 18 * * * 18 10
fication and Prediction Methods from Statistics, Neural nets, Machine 5 10 * 18 * * * * * * * * 18 10
learning, and Expert systems. Morgan Kaufmann Publishers, Inc., 1991. 8 35 10 * 35 * * * * * * * * 18 10
[32] V. Cutello, G. Nicosia, and M. Pavone, “A hybrid immune algorithm 35 35 10 * * * 18 * * * * * * * 10
with information gain for the graph coloring problem,” in Proceedings 35 * 10 * 8 * * 18 * * * * * 18 10
of the Genetic and Evolutionary Computation (GECCO 2003), E. Cant- 18 * * * * * 10 * * * 8 4 * * * * * * 10
Paz and et al, Eds. Chicago, IL, USA: Springer-Verlag, July 12-16 18 * * * * 10 * * * 34 * * * * * * 18 10
2003, pp. 171–182. 18 * * * * * * * * * 10 * * * 35 * * * * * * 18 10
[33] P. Matzinger, “The danger model: A renewed sense of self,” Science, * * * * * * * 8 * * 18 10
vol. 296, pp. 301–305, April 2002. 12 * * * * * * * * * 13 * 13 10
[34] H. Bersini, “Self-assertion versus self-recognition: A tribute to Francisco * 8 18 * * * * * * * * * 10
Varela,” in Proceedings of 1st International Conference on Artificial Im- * * * 8 * * * * * * * 18 10
mune Systems (ICARIS), J. Timmis and P. J. Bentley, Eds. Canterbury, * * * 12 * * * * * * * * 10
UK: University of Kent at Canterbury Printing Unit., September 2002, * * * * * * * * * 18 * 18 10
pp. 107–112. 13 * * * * * * 18 * * * * * 18 10
[35] J. R. Quinlan, “Learning decision tree classifiers,” ACM Computing * * * * * * 8 * * * * * 18 10
Surveys, vol. 28, no. 1, pp. 71–72, March 1996. * * * * * * 35 * * * * * 8 10
14 * * * * * 8 * * * * * * * 18 10
* * * * * 12 * * * * * * * * 10
A PPENDIX * * * * * * * * * * * * 18 18 10
* * * * * * * * * * * * 0 18 10
This appendix presents the experimental results and analysis * * * * * * 18 * * * * * * * 10
for C5.0 algorithm [35], for further information see [1]. * * * * * * 8 * * * * * * 18 10
The testing of the C5.0 rule induction algorithm corresponds * * * * * * 5 * * * * * * * 10
15
to its evaluation for appropriateness to the detector generation 16 * * * * * * * * * 18 * * * * * * 10
process. Part of the evaluation of the C5.0 algorithm was 17
to inspect the type of detectors generated by the algorithm. 18
19
It was desired that the C5.0 algorithm generated specialised 20
detectors. Specialised detectors can only match a specific
TABLE VIII
sequence of states, while generalised detectors can match
E RROR DETECTORS GENERATED FROM THE C5.0 ALGORITHM WITH
multiple sequences of states. This is because generalised
VARYING WINDOW SIZES . E XPERIMENTS ARE CONDUCTED OVER 10
detectors consist of don’t cares (*), which represent place
TRIALS .
holders that can be substituted by any other state of an ATM.
Generalised detectors have limited use during maintenance,
since the don’t cares do not indicate the specific states that
can be traced to the consequent fatal state. It can be observed from table VIII that the detectors gener-
Using a window size of 6, the generalised detectors gen- ated have a maximum number of two contiguous states before
erated from the C5.0 algorithm are shown in table VII. This the fatal state. The implication is that the C5.0 algorithm
outcome has motivated the application of longer window sizes generates generalised error detectors irrespective of the win-
for generating specialised detectors. In order to test this, the dow size. Therefore, the C5.0 algorithm was not considered
C5.0 algorithm was trained with sequences of states that were appropriate for the off-line detector generation process.
generated using longer window sizes, which were varied from The outcomes of the experiments described above show that
15

the C5.0 algorithm generates generalised detectors, regardless


of the applied window size. This motivated the further work
described in the paper.

S-ar putea să vă placă și