Documente Academic
Documente Profesional
Documente Cultură
(3)
H.S. Lopes / Applied Soft Computing xxx (2005) xxxxxx 5
DTD 5
where
c
1
x
k
0 ^x
k1
<0
c
2
x
k
0 ^x
k1
>0
c
3
jx
k
x
k1
j threshold
8
<
:
(4)
The number of slope sign changes is counted by
feature SSC, dened by Eq. (5). SSC, like the previous
ZC, roughly accounts for the frequency contents of the
signal. This feature is incremented when the current
data point is either a positive peak (condition c
4
) or a
negative peak (condition c
5
), and this is not a noise
conditions c
3
(previously dened) and c
6
.
SSC
SSC 1; if c
4
_c
5
^c
3
_c
6
SSC; otherwise
(5)
where
c
4
x
k
>x
k1
^x
k
>x
k1
c
5
x
k
<x
k1
^x
k
<x
k1
c
6
jx
k
x
k1
j threshold
8
<
:
(6)
Eq. (7) denes the waveform length WL, that gives
some reference about the complexity of the wave
within the window:
WL
X
N
k1
jx
k1
x
k
j (7)
The number of up slopes (US) and down slopes (DS),
dened, respectively, by Eqs. (8) and (9), are features
that account for the number of samples that are,
respectively, in ascending and descending segments
of the waveform.
US
US 1; if x
k
x
k1
>0
US; otherwise
(8)
DS
DS 1; if x
k
x
k1
<0
DS; otherwise
(9)
A given channel of the EEG under analysis is divided
into time windows of xed size (frames). After, fea-
tures described by Eqs. (1)(9) are computed using the
sampled data in the frame, to form a seven-dimen-
sional vector {MAV, AV, ZC, SSC, WL, US, DS} that
is further used by the GP algorithm.
4.3. The GP system
4.3.1. Function and terminal sets
For both case studies, the terminal set consists of
the features computed for every frame, and an
ephemeral random constant (R) created at the
beginning of the run and dened in the interval
[1.0,1.0]. Therefore, the terminal set is dened by:
T = {MAV, AV, ZC, SSC, WL, US, DS, R}.
The function set is the same proposed by Koza [2]
for many symbolic regression problems, and include
the basic arithmetic operators and some mathematical
functions, i.e., F = {+,,*,%, sin, cos, exp, rlog}.
Symbols % and rlog stand, respectively, for
protected division and protected logarithm. It is not
possible to use the ordinary division and logarithm
because the function set has to obey the closure
property of GP, that states that every possible value
generated by any combination of the terminal and
function sets has to be accepted as operand of every
function. Therefore, given two real numbers a; b 2R,
by default, a%b = 0 if b = 0, and a%b = a/b, other-
wise. In the same way, rlog(a) = 0 if a = 0, and
rlog(a) = log(jaj) otherwise.
4.3.2. Fitness function
The tness function evaluates the quality of an
individual regarding its ability to correctly classify a
set of instances of the problem (tness cases). The
tness function is normalized in such a way that it can
assume values in the range [0,1], where the higher the
value, the better the individual as a possible solution to
the problem. In Eq. (10), tness_cases is the
number of instances in the training set (80 for SASWC
and 96 for SOSW) and hits is the number of correct
classications (including positive and negative cases)
achieved by the current individual.
fitness
1
1 fitness cases hits
(10)
The frames of both training and testing sets were
previously labelled by the EEG expert with 1 for
negative cases (without the pattern) and +1 for positive
cases (with the pattern). The result of the application
of an individual (mathematical expression) to a given
set of data is a numerical value in the range [1,
+1]. Therefore, a wrapper function [2] was used so as
H.S. Lopes / Applied Soft Computing xxx (2005) xxxxxx 6
DTD 5
to reinterpret the output value of the tness function
making a mapping: all values less than or equal to 1
means that the frame was classied as not having the
pattern (negative class), whereas all values greater
than or equal to +1 means that the frame was classied
as having the pattern (positive class). Any output in the
gap ]1,1[ is interpreted as undened. Dividing the
feature space into three different regions is useful for
separating classes so as to reduce the possibility of
overlapping results during the evolutionary process.
4.3.3. GP running parameters
In all the experiments reported in this paper the
initial randompopulation of individuals was generated
by Kozas ramped half-and-half method [2], which
creates an equal number of trees for tree-depth values
varying from 2 to the maximum depth 6. During the
run, the maximum depth of a tree was set to 17,
whereas the maximum number of nodes was not
limited. The selection method of an individual as well
as of its mate was the usual tness-proportionate (also
known as roulette wheel). Experiments with other
types of selection methods like tournament selection
and ranking did not show any signicant improvement
in the performance of the GP. As usual, probabilities
for the reproduction and crossover operators were set
to 10 and 90%, respectively. The GP was run for 50
complete generations (not including the initial
population) always using 500 individuals in the
population. There were two stop criteria: when the
maximum number of generations was achieved or
when an individual with the maximum number of hits
was found ( tness = 1, see Eq. (10)). The best
individual ever found in the run was designated as
the result. The GP system was written using standard
C programming language and run on a Sun Sparcsta-
tion-20 workstation. It took around 50 s for training
the SASWCclassier, and around 300 s for the SOSW
classier.
5. Results
5.1. Supervised training
After running the GP algorithm, for both case
studies, mathematical expressions (classiers) were
found. The best S-expressions found by GP are shown
in Table 1. For the rst case study, the corresponding
expression has 36 nodes (19 operators, 13 terminals
and 4 numerical constants) and only features WL, SSC
and ZC were found to be relevant. The corresponding
mathematical expression, after algebraic simplica-
tion, is shown in Eq. (11). For the second case study,
the S-expression found has 100 nodes (58 operators,
41 terminals and one numerical constant) and all
features were used. The corresponding simplied
mathematical expression is given in Eq. (12). It can be
seen that few can be done to simplify this expression.
In Eqs. (11) and (12) recall that both division and
logarithm are protected functions, as mentioned
before. Also, the output class should be interpreted
according to Section 4.3.2.
class ZC:cosc a
where
a 0:08966:LEN SC
b 0:08966:LEN
2
:a SC
c b:LEN:log0:9394:SC SC
8
>
>
>
>
<
>
>
>
>
:
(11)
class LEN a b:c DOWN
where
a
2:ZC
LEN 1 d
b logUP:SC
c coscoscosMAV cosSC
logSC:logMAV
d logUP:ZC:log f DOWN
f SC: cosg
logLEN
ZC
g LEN log2:SC:ZC
b cosh
LEN
h
MAV
AVG
5.2. Testing
The generalization capability of the evolved
classiers was further tested on unseen instances
using the testing sets (80 for SASWC and 96 for
SOSW). When classifying an instance, depending on
the class predicted by the classier and on the true
class of the instance, four types of results can be
observed for the prediction, as follows:
H.S. Lopes / Applied Soft Computing xxx (2005) xxxxxx 7
DTD 5
true positive (tp)the classier predicts that the
pattern is present in the frame, and it really is;
false positive (fp)the classier predicts that the
pattern is present in the frame, but it is not;
true negative (tn)the classier predicts that the
pattern is not present in the frame, and indeed it is
not there;
false negative (fn)the classier predicts that the
pattern is not present in the frame, but in fact it is.
The accuracy of a classier can be described more
precisely using two indicators frequently used in data
classication [29,17,21], namely, sensitivity (Se) and
specicity (Sp). These indicators take into account not
only the number of correct classications, but also the
relationship between positive and negative classes.
Sensitivity is dened as Se = tp/(tp + fn) and speci-
city as Sp = tn/(tn + fp). Sensitivity measures the
fraction of positive instances (frames with the
epileptic pattern) that will be correctly classied.
Specicity measures the fraction of negative instances
(frames without the pattern) that will be correctly
classied as background activity.
Results of the application of the evolved classiers
to the testing sets of the two case studies are presented
in Table 2 in the form of a confusion matrix, where
class ( ) and real ( ) represent, respectively, the
outcome of the classier and the real class. Briey,
Se = 1.00 and Sp = 0.93 for SASWC patterns, and
Se = 0.94 and Sp = 0.89 for SOSW patterns.
6. Discussion and conclusions
The number of elements (functions and term-
inals) of the resulting classiers for both case
studies is an estimate of the hardness of the pattern
recognition task. For the second case study (SOSW),
the classier found is much more complex than the
rst. This is a consequence of the underlying
difculty to identify unequivocally SOSW patterns
(when compared with SASWC patterns). This is
consistent with the fact that SOSW patterns are
often misinterpreted by EEG experts as artifacts and
vice versa [13].
Comparing the obtained S-expressions and the
corresponding simplied mathematical expressions,
it is seen the proliferation of useless code. This
characteristic of most GP implementations reveals
the bloat effect, whose consequences are still
controversial. Although a strategy for automatically
simplifying S-expressions during search would help
to nd shorter solutions, it is not possible to
assure that they would be better. Notwithstanding,
using a parsimony term [17] in the tness function
could be benecial for obtaining more compact
solutions without losing signicant accuracy. In
most GP applications, and also conrmed in this
work, it is a very difcult issue the balance between
simplicity and accuracy of evolved programs [30].
This seems to be an open question for further
investigation.
In an attempt to interpret results obtained by the GP
algorithm for the rst case (SASWC), it should be
noted that the features selected suggest somehow the
presence of high frequency components (fast activity).
This nding is in accordance with Gotman and Gloor
[10] who described an heuristic algorithm that uses
similar features for the detection of epileptic spikes.
Regarding results for the second case study (SOSW),
its complexity precludes any interpretation.
H.S. Lopes / Applied Soft Computing xxx (2005) xxxxxx 8
DTD 5
Table 2
Results for the testing sets: SASWC and SOSW
Real (+) Real ()
SASWC Class (+) 40 3
Class () 0 37
SOSW Class (+) 45 5
Class () 3 43
Table 1
Best S-expressions found for the two case studies
SASWC (+(*ZC(cos((*(*LEN((*(LEN(*(*LEN 0.08966) ((*LEN 0.08966)SC)))SC))(rlog(*(cos 0.08966) SC)))SC))
(rlog(*(cos((*LEN 0.08966)SC))
SOSW ((+LEN(%(+ZC ZC)((+LEN(%(*(cos(cos(*((+LEN(+UP ZC))UP)(+LEN(MAV 0.64722)))))SC)(LEN LEN)))
(*rlog(*UP ZC))(+(+(*(LEN LEN) (sin(*UP SC)))(rlog(*SC+(cos((+LEN(%((rlog(*UP SC))
(cos(%MAV AVG)))LEN)) (rlog(*SC(+ZC ZC)))))(%(rlog LEN)ZC))))) DOWN)))))(*(rlog(*UP SC))
(+(+(cos(cos(cos(MAV(SC(SC(cosSC)))))))(rlog(*SC (rlog MAV))))DOWN)))
Despite the complexity of the results, the
accuracy achieved demonstrate the excellent per-
formance of the classiers for both case studies.
This is true not only for recognizing ill-dened
patterns, but mainly for rejecting background
activity as epileptic patterns. Recall that frames
were visually selected from the raw signal by an
expert, based on his own experience. Nevertheless,
in the context of data analysis, both training and
testing sets were relatively small, if one takes into
account the biological variability of the EEG
Therefore, the use of the evolved classiers to
other patients may not give the same accuracy. In
the EEG pattern recognition literature the difculty
in generalizing methods and results is common-
place. Consequently, the proposed methodology
does not intend to be a gold standard for
detecting epileptic patterns in the EEG. Also, it
was not the objective to propose a competitive
method regarding other computational approaches.
The main objective of the paper was achieved: to
demonstrate how GP (in conjunction with pattern
recognition principles) can be successfully used for
complex pattern recognition in a noisy environment.
For both case studies, the computation of the
classiers are reasonably fast, but the features take
much more time to be calculated. Nevertheless, using
an updated desktop computer, the system can be fast
enough to process in realtime a multi-channel EEG.
Therefore, we believe that the overall methodology
can be very promising, particularly for the on-line
analysis of long-term EEG monitoring.
The methodology described in this work, compris-
ing feature extraction and evolutionary model con-
struction, is simple to implement and has generality to
be applied to the recognition of other patterns in time-
varying signals like geophysical, radar, speech,
meteorological and others, opening a wide eld for
further research.
Further work will comprise an automatic feature
generation by the GP system, as well as the use of a
parsimonious term in the tness function in order to
reduce the complexity of the classier. Also, a
more powerful evolutionary technique, like gene-
expression programming [31], could be also inter-
esting to be considered. In the near future, this
methodology shall be extended to other pattern
recognition tasks.
Acknowledgments
The author would like to thank all the people that
anonymously contributed to the availability of the
EEG databases used in this work This work was
partially supported by a research grant from the
Brazilian National Research CouncilCNPQ (Pro-
cess No. 305720/04-0).
References
[1] http://www.who.org/.
[2] J.R. Koza, Genetic Programming: on the Programming of
Computers by Means of Natural Selection, MIT Press, Cam-
bridge, 1992.
[3] W.R.S. Webber, R.P. Lesser, R.T. Richardson, K. Wilson, An
approach to seizure detection using an articial neural network
(ANN), Electroencephalogr. Clin. Neurophysiol. 98 (1996)
250272.
[4] C. Kurth, B.J. Steinhoff, Automated seizure detection in
continuous EEG recordings by a Kohonen feature map, Epi-
lepsia 38 (Suppl. 3) (1997) 154.
[5] T. Kalayci, O
. O