Jean-Marc Vesin and Touradj Ebrahimi - Trends in Brain Computer Interfaces

EURASIP Journal on Applied Signal Processing
Trends in Brain Computer Interfaces

Guest Editors: Jean-Marc Vesin and Touradj Ebrahimi

Guest Editors: Jean-Marc Vesin and Touradj Ebrahimi
Copyright 2005 Hindawi Publishing Corporation. All rights reserved.
This is a special issue published in volume 2005 of EURASIP Journal on Applied Signal Processing. All articles are open access
articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly cited.
Editor-in-Chief
Marc Moonen, Belgium
Senior Advisory Editor
K. J. Ray Liu, College Park, USA
Associate Editors
Gonzalo Arce, USA Arden Huang, USA King N. Ngan, Hong Kong
Jaakko Astola, Finland Jiri Jan, Czech Douglas OShaughnessy, Canada
Kenneth Barner, USA Sren Holdt Jensen, Denmark Antonio Ortega, USA
Mauro Barni, Italy Mark Kahrs, USA Montse Pardas, Spain
Jacob Benesty, Canada Thomas Kaiser, Germany Wilfried Philips, Belgium
Kostas Berberidis, Greece Moon Gi Kang, Korea Vincent Poor, USA
Helmut Blcskei, Switzerland Aggelos Katsaggelos, USA Phillip Regalia, France
Joe Chen, USA Walter Kellermann, Germany Markus Rupp, Austria
Chong-Yung Chi, Taiwan Lisimachos P. Kondi, USA Hideaki Sakai, Japan
Satya Dharanipragada, USA Alex Kot, Singapore Bill Sandham, UK
Petar M. Djuri, USA C.-C. Jay Kuo, USA Dirk Slock, France
Jean-Luc Dugelay, France Geert Leus, The Netherlands Piet Sommen, The Netherlands
Frank Ehlers, Germany Bernard C. Levy, USA Dimitrios Tzovaras, Greece
Moncef Gabbouj, Finland Mark Liao, Taiwan Hugo Van hamme, Belgium
Sharon Gannot, Israel Yuan-Pei Lin, Taiwan Jacques Verly, Belgium
Fulvio Gini, Italy Shoji Makino, Japan Xiaodong Wang. USA
A. Gorokhov, The Netherlands Stephen Marshall, UK Douglas Williams, USA
Peter Handel, Sweden C. Mecklenbruker, Austria Roger Woods, UK
Ulrich Heute, Germany Gloria Menegaz, Italy Jar-Ferr Yang, Taiwan
John Homer, Australia Bernie Mulgrew, UK
Contents
Editorial, Jean-Marc Vesin and Touradj Ebrahimi
Volume 2005 (2005), Issue 19, Pages 3087-3088
Clustering of Dependent Components: A New Paradigm for fMRI Signal Detection, Anke Meyer-Bse,
Monica K. Hurdal, Oliver Lange, and Helge Ritter
Robust EEG Channel Selection across Subjects for Brain-Computer Interfaces, Michael Schrder,
Thomas Navin Lal, Thilo Hinterberger, Martin Bogdan, N. Jeremy Hill, Niels Birbaumer,
Wolfgang Rosenstiel, and Bernhard Schlkopf
Determining Patterns in Neural Activity for Reaching Movements Using Nonnegative Matrix
Factorization, Sung-Phil Kim, Yadunandana N. Rao, Deniz Erdogmus, Justin C. Sanchez,
Miguel A. L. Nicolelis, and Jose C. Principe
Finding Significant Correlates of Conscious Activity in Rhythmic EEG, Piotr J. Durka
Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface,
David A. Peterson, James N. Knight, Michael J. Kirby, Charles W. Anderson, and Michael H. Thaut
A Time-Frequency Approach to Feature Extraction for a Brain-Computer Interface with a
Comparative Analysis of Performance Measures, Damien Coyle, Girijesh Prasad, and T. M. McGinnity
EEG-Based Asynchronous BCI Controls Functional Electrical Stimulation in a Tetraplegic Patient,
Gert Pfurtscheller, Gernot R. Mller-Putz, Jrg Pfurtscheller, and Rdiger Rupp
Steady-State VEP-Based Brain-Computer Interface Control in an Immersive 3D Gaming
Environment, E. C. Lalor, S. P. Kelly, C. Finucane, R. Burke, R. Smith, R. B. Reilly, and G. McDarby
Estimating Driving Performance Based on EEG Spectrum Analysis, Chin-Teng Lin, Ruei-Cheng Wu,
Tzyy-Ping Jung, Sheng-Fu Liang, and Teng-Yi Huang
EURASIP Journal on Applied Signal Processing 2005:19, 30873088
c 2005 Hindawi Publishing Corporation
Editorial
Jean-Marc Vesin
Signal Processing Institute, Swiss Federal Institute of Technology, 1015 Lausanne, Switzerland
Email: jean-marc.vesin@ep.ch
Touradj Ebrahimi
Signal Processing Institute, Swiss Federal Institute of Technology, 1015 Lausanne, Switzerland
Email: touradj.ebrahimi@ep.ch
Brain-computer interfaces (BCI), an emerging domain
in the eld of man-machine interaction, have attracted in-
creasing attention in the last few years. Among the reasons
for such an interest, one may cite the expansion of neuro-
sciences, the development of powerful information process-
ing and machine learning techniques, as well as the mere
fascination for control of the physical world with human
thoughts.
BCI pose signicant challenges, at both the biomedical
and the data processing levels. Brain processes are not fully
understood yet. Also, the information on the dynamics of
these processes, up to now gathered mainly with electroen-
cephalographic (EEG) or functional magnetic resonance
imaging (fMRI) systems, is incomplete and, more than often,
noisy. As such, it is important for BCI applications to deter-
mine how, physically, the maximum amount of information
can be extracted, and to design ecient tools both to process
the data and to classify the results.
This special issue presents nine papers exhibiting a rather
balanced state of research and development in BCI. Three
papers deal with information extraction, three with signal
processing aspects, and three present applications. Moreover,
while most current eorts concentrate on continuous EEG-
based techniques, fMRI, implanted microwire electrode, and
evoked potential-based techniques are also presented.
In the rst batch of three papers on information extrac-
tion, A. Meyer-B ase et al. study independent component
analysis (ICA) and unsupervised clustering techniques and
combine them to produce task-related activation maps for
fMRI datasets. M. Schr oder et al. explore the problem of EEG
channel selection for BCI tasks, and S.-P. Kim et al. propose
a nonnegative matrix factorization to identify local spatio-
temporal patterns of neural activity in microwire electrode
signals from monkey motor cortical regions.
The second batch of papers devoted to signal process-
ing aspects of EEG signals bring new insights to this eld
by making use of advanced signal processing techniques and
by evaluating their performance. The paper by P. J. Durka
presents a methodology for the time-frequency analysis of
event-related changes in EEG signals. D. A. Peterson et al. in-
vestigate the potential of blind source separation (BSS) and
support vector machine (SVM)-based classication to dis-
criminate two cognitive tasks. Finally, D. Coyle et al. deal with
the extraction of time-frequency features to discriminate two
imagined movements.
The last batch concentrates on three exciting applications
of BCI. The paper by G. Pfurtscheller et al. describes a BCI
approach for the control of a grasping device using func-
tional electrical stimulation by a tetraplegic patient. E. Lalor
et al. present a BCI-based 3D video game using steady-state
visual evoked potentials, and C.-T. Lin et al. propose an EEG-
based car-driver drowsiness estimation device.
We would like to thank the authors of this special issue
for their valuable submissions and the reviewers for their
high-quality evaluation. We hope the contributions made
here will serve to further encourage and stimulate progress
in this new and exciting eld. Last but not least, we would
like to thank the editorial team of EURASIP JASP for their
continuous support and patience.
Jean-Marc Vesin
Touradj Ebrahimi
Jean-Marc Vesin graduated from the

Ecole
Nationale Sup erieure dIng enieurs Elec-
triciens de Grenoble (ENSIEG, Grenoble,
France) in 1980. He received his M.S. degree
from Laval University, Qu ebec, Canada, in
1984, where he spent four years on research
projects. After two years in the industry, he
joined the Swiss Federal Institute of Tech-
nology, Lausanne, Switzerland, where he
obtained his Ph.D. degree in 1992. He is
now a Senior Researcher in the Signal Processing Institute of EPFL.
3088 EURASIP Journal on Applied Signal Processing
His research work is focused on the analysis of biomedical signals
and the computer modeling of biological systems, with an emphasis
on cardiovascular and neuronal applications. He is the author of
more than 150 journal and conference papers.
Touradj Ebrahimi is currently a Professor
at EPFL, involved in research and teach-
ing of multimedia signal processing. He
has been the recipient of various distinc-
tions such as the IEEE and Swiss National
ASE Award, the SNF-PROFILE grant for
advanced researchers, three ISO certicates
for key contributions to MPEG-4 and JPEG
2000, and the Best Paper Award of the
IEEE Transactions on Consumer Electron-
ics. His research interests include still, moving, and 3D image
processing and coding, visual information security (rights protec-
tion, watermarking, authentication, data integrity, steganography),
new media, and human-computer interfaces (smart vision, brain-
computer interface). He is the author or the coauthor of more than
150 research publications, and holds 10 patents.
Clustering of Dependent Components:
A NewParadigmfor fMRI Signal Detection
Anke Meyer-B ase
Department of Electrical and Computer Engineering, Florida State University, Tallahassee, FL 32310-6046, USA
Email: amb@eng.fsu.edu
Monica K. Hurdal
Department of Mathematics, Florida State University, Tallahassee, FL 32306-4510, USA
Email: mhurdal@math.fsu.edu
Oliver Lange
Department of Electrical and Computer Engineering, Florida State University, Tallahassee, FL 32310-6046, USA
Email: oliver@lange.org
Helge Ritter
Neuroinformatics Group, Faculty of Technology, University of Bielefeld, 33501 Bielefeld, Germany
Email: helge@techfak.uni-bielefeld.de
Received 1 February 2004
Exploratory data-driven methods such as unsupervised clustering and independent component analysis (ICA) are considered to
be hypothesis-generating procedures and are complementary to the hypothesis-led statistical inferential methods in functional
magnetic resonance imaging (fMRI). Recently, a new paradigm in ICA emerged, that of nding clusters of dependent com-
ponents. This intriguing idea found its implementation into two new ICA algorithms: tree-dependent and topographic ICA. For
fMRI, this represents the unifying paradigm of combining two powerful exploratory data analysis methods, ICA and unsupervised
clustering techniques. For the fMRI data, a comparative quantitative evaluation between the two methods, tree-dependent and
topographic ICA, was performed. The comparative results were evaluated by (1) task-related activation maps, (2) associated time
courses, and (3) ROC study. The most important ndings in this paper are that (1) both tree-dependent and topographic ICA
are able to identify signal components with high correlation to the fMRI stimulus, and that (2) topographic ICA outperforms all
other ICA methods including tree-dependent ICA for 8 and 9 ICs. However for 16 ICs, topographic ICA is outperformed by tree-
dependent ICA (KGV) using as an approximation of the mutual information the kernel generalized variance. The applicability of
the new algorithm is demonstrated on experimental data.
Keywords and phrases: dependent component analysis, topographic ICA, tree-dependent ICA, fMRI.
1. INTRODUCTION
Functional magnetic resonance imaging with high tempo-
ral and spatial resolution represents a powerful technique
for visualizing rapid and ne activation patterns of the hu-
man brain [1, 2, 3, 4, 5]. As is known from both theoret-
ical estimations and experimental results [4, 6, 7], an acti-
vated signal variation appears very low on a clinical scan-
ner. This motivates the application of analysis methods to
determine the response waveforms and associated activated
regions. Generally, these techniques can be divided into two
groups: model-based techniques require prior knowledge
about activation patterns, whereas model-free techniques do
not. However, model-based analysis methods impose some
limitations on data analysis under complicated experimen-
tal conditions. Therefore, analysis methods that do not rely
on any assumed model of functional response are considered
more powerful and relevant. We distinguish two groups of
model-free methods: transformation-based and clustering-
based methods. There are two kinds of model-free methods.
The rst kind, principal component analysis (PCA) [8, 9]
or independent component analysis (ICA) [10, 11, 12, 13],
transforms original data into high-dimensional vector space
to separate functional response and various noise sources
from each other. The second kind, fuzzy clustering anal-
ysis [14, 15, 16, 17] or self-organizing maps [17, 18, 19],
attempts to classify time signals of the brain into several
patterns according to temporal similarity among these sig-
nals.
Among the data-driven techniques, ICA has been shown
to provide a powerful method for the exploratory analysis
of fMRI data [11, 13]. ICA is an information-theoretic ap-
proach which enables to recover underlying signals, or inde-
pendent components (ICs) fromlinear data mixtures. There-
fore, it is an excellent method to be applied for the spatial lo-
calization and temporal characterization of sources of BOLD
activation. ICA can be applied to fMRI both temporally and
spatially. Spatial ICA has dominated so far in fMRI applica-
tions because the spatial dimension is much larger than the
temporal dimension in fMRI. However, recent literature re-
sults have suggested that temporal and spatial ICAyield simi-
lar results for experiments where two predictable task-related
components are present.
A new methodology has attracted a lot of attention in
the ICA community during the last two years: the idea of
nding clusters of independent components. Two leading
papers implemented this new paradigm in a striking way.
Clusters are dened as connected components of a graphi-
cal model (lattice in [20] and tree structured in [21]). Both
models attempt a decomposition of the source variables such
that they are dependent within a cluster and independent
between the clusters. This idea emerged from multidimen-
sional ICA, where the sources are not assumed to be all mu-
tually independent [22]. Instead, it is assumed that they can
be grouped in n-tuples, such that within these tuples they are
dependent on each other, but are independent outside.
The two paradigms dier in terms of topology and the
knowledge of number and sizes of components.
In [20], the components are arranged on a two-dimen-
sional grid or lattice as is typical in topographic models. The
goal is to dene a statistical model where the topographic
proximity reects the statistical dependencies between com-
ponents. The components (simple cells) are placed on the
grid such that any two cells that are close to each other model
dependent components whereas cells that are far from each
other model independent components. The measure of de-
pendency is based on the correlation of energies. Energy in
this context means the squaring operation. Nonlinear cor-
relations are of importance since they cannot be easily set
to zero by standard whitening procedures. Translated to our
model, this means that energies are strongly positively cor-
related for neighboring components. The topology of the
model is xed. This model also requires that the number and
sizes of the components have to be xed in advance. Learning
is based on the maximization of the likelihood.
A totally dierent concept is employed in [21]. Here, the
topology of the dependency structure is not xed in advance.
However, it is assumed that it has the structure of a tree. The
goal of the learning is to identify a minimal spanning tree
connecting the given sources in such a manner that no other
tree expresses the dependency structure of the given distri-
bution better. It is interesting to point out that in traditional
ICA the graphical model has no edges meaning that the ran-
dom variables are mutually independent.
We have seen that both clustering methods as well as ICA
techniques have their particular strengths in fMRI signal de-
tection. Therefore, it is natural to look for a unifying tech-
nique that combines those two processing mechanisms and
applies this combination to fMRI. The topographic and the
tree-dependent ICA, as previously described, have the com-
putational advantages associated with both techniques.
In this paper, we perform a detailed comparative study
for fMRI among the tree-dependent and topographic ICA
with standard ICA techniques. In a systematic manner, we
will compare and evaluate the results obtained based on
each technique and present the benets associated with each
paradigm.
2. EXPLORATORY DATA ANALYSIS METHODS
Functional organization of the brain is based on two comple-
mentary principles, localization and connectionism. Local-
ization means that each visual function is performed mainly
by a small set of cortical neurons. Connectionism, on the
other hand, expresses that the brain regions involved in a cer-
tain visual cortical function are widely distributed, and thus
the brain activity necessary to perform a given task may be
the functional integration of activity in distinct brain sys-
tems. It is important to stress that in neurobiology the term
connectionism is used in a dierent sense than that used in
the neural network terminology.
The following sections are dedicated to presenting the al-
gorithms and evaluate the discriminatory power of the two
main groups of exploratory data analysis methods.
2.1. The basic ICAalgorithms
According to the principle of functional organization of the
brain, it was suggested for the rst time in [11] that the mul-
tifocal brain areas activated by the performance of a visual
task should be unrelated to the brain areas whose signals
are aected by artifacts of physiological nature, head move-
ments, or scanner noise related to fMRI experiments. Ev-
ery single above-mentioned signal can be described by one
or more spatially independent components, each associated
with a single time course of a voxel and a component map.
It is assumed that the component maps, each described by a
spatial distribution of xed values, are representing overlap-
ping, multifocal brain areas of statistically dependent fMRI
signals. This aspect is visualized in Figure 1. In addition, it is
considered that the distributions of the component maps are
spatially independent and, in this sense, uniquely specied.
Mathematically, this means that if p
k
(C
k
) species the prob-
ability distribution of the voxel values C
k
in the kth com-
ponent map, then the joint probability distribution of all n
components yields
p
_
C
1
, . . . , C
m
_
=
n
k=1
p
k
_
C
k
_
, (1)
where each of the component maps C
k
is a vector (C
ki
, i =
1, 2, . . . , M), where M gives the number of voxels. Indepen-
dency is a stronger condition than uncorrelatedness. It was
Clustering of Dependent fMRI Components 3091
Independent
components
Measured
signals
Time course
Map
(a)
n
.
.
.
2
1
S X = AS X
t = n
.
.
.
t = 2
t = 1
Mixing matrix
A
Component
maps
Measured
fMRI signals
Mixing
(b)
Figure 1: Visualization of ICA applied to fMRI data. (a) Scheme of fMRI data decomposed into independent components, and (b) fMRI
data as a mixture of independent components where the mixing matrix A species the relative contribution of each component at each time
point [11].
shown in [11] that these maps are independent if the active
voxels in the maps are sparse and mostly nonoverlapping.
Additionally it is assumed that the observed fMRI signals are
the superposition of the individual component processes at
each voxel. Based on these assumptions, ICA can be applied
to fMRI time series to spatially localize and temporally char-
acterize the sources of BOLD activation.
Dierent methods for performing ICA decompositions
have been proposed which employ dierent objective func-
tions together with dierent criteria of optimization of these
functions, and it is assumed that they can produce dierent
results.
2.2. Models of spatial ICAin fMRI
In the following we will assume that X is a T M matrix of
observed voxel time courses (fMRI signal data matrix), C is
the N M random matrix of component map values, and
A is a T N mixing matrix containing in its columns the
associated time courses of the N components. Furthermore,
T corresponds to the number of scans, and M is the number
of voxels included in the analysis.
The spatial ICA (sICA) problem is given by the following
linear combination model for the data:
X = AC, (2)
where no assumptions are made about the mixing matrix A
and the rows C
i
being mutually statistically independent.
Then the ICA decomposition of X can be dened as an
invertible transformation:
C = WX, (3)
where Wis an unmixing matrix providing a linear decompo-
sition of data. A is the pseudoinverse of W.
The employed ICA algorithms are the TDSEP, JADE, and
the FastICA approach based on the minimization of mutual
information but using the negentropy as a measure of non-
Gaussianity [23], and topographic ICA which combines to-
pographic mapping with ICA [20].
2.3. Tree-dependent component analysis model
The paradigm of TCA is derived from the theory of tree-
structured graphical models. In [24] a strategy was shown to
approximate optimally an n-dimensional discrete probabil-
ity distribution by a product of second-order distributions,
or the distribution of the rst-order tree dependence. A tree
is an undirected graph with at most a single edge between
two nodes. This tree concept can be easily interpreted with
respect to ICA. A graph with no edges means that the ran-
dom variables are mutually independent and this pertains to
ICA. On the other hand, if no assumptions are made about
independence, then the corresponding family of probability
distributions represents the set of all distributions.
Aprobability distribution can be approximated in several
ways. Here, we look into approximations based on a prod-
uct of n 1 second-order component distributions. In [24]
a strategy of the best approximation of an nth-order distri-
bution was developed by a product of n 1 second-order
component distributions:
P
i
(x) =
n
i=1
P
_
x
mi
| x
mj (i)
_
, 0 j(i) < i, (4)
where P(x) is a joint probability distribution of n discrete
variables with x = x
1
, . . . , x
n
being a vector, (m
1
, . . . , m
n
) is an
unknown permutation of integers 1, 2, . . . , n, and P(x
i
| x
0
) is
by denition equal to P(x
i
). The above introduced probabil-
ity distribution is named a probability distribution of rst-
order tree dependence.
To determine the goodness of an approximation, it is nec-
essary to dene a closeness as
I
_
P, P
a
_
=
_
x
P(x) log
P(x)
P
a
(x)
, (5)
where P(x) and P
a
(x) are two probability distributions of the
n random variables x. The quantity I(P, P
a
) has the property
I(P, P
a
) 0.
Translated to random variables, the above denition is
named mutual information and is always nonnegative:
I
_
x
i
, x
j
_
=
_
xi ,xj
P
_
x
i
, x
j
_
log
_
P
_
x
i
, x
j
_
P
_
x
i
_
P
_
x
j
_
_
. (6)
In the following, we will state the solution to the approxi-
mation of the probability distribution. We are searching for
a distribution of tree dependence P
(x
1
, . . . , x
n
) such that
I(P, P
) I(P, P
t
) for all t T
n
where T
n
represents the
set of all possible rst-order dependence trees. Thus, the so-
lution is dened as the optimal rst-order dependence tree.
In parlance of graph theory, every branch of the depen-
dence tree is assigned a branch weight I(x
i
, x
j(i)
). Thus being
given a dependence tree t, the sum of all branch weights be-
comes a useful quantity.
In [24] it was shown that a maximum-weight depen-
dence tree is a dependence tree t such that, for all t
in T
n
,
n
_
i=1
I
_
x
i
, x
j
(i)
_
n
_
i=1
I
_
x
i
, x
j
(i)
_
. (7)
In other words, a probability distribution of tree dependence
P
t
(x) is an optimum approximation to P(x) if and only if its
dependence tree t has maximum weight or minimizing the
closeness measure I(P, P
t
) is equivalent to maximizing the
total branch weight.
The idea of approximating discrete probability distribu-
tions with dependence trees described before and adapted
from [24] can be easily translated to ICA [21].
In classic ICA, we want to minimize the mutual informa-
tion of the estimated components s = Wx. Thus, the result
derived in [24] can be easily extended and becomes the tree-
dependent ICA.
The objective function for TCA is given by J(x, W, t) and
includes the demixing matrix W. Thus, the mutual informa-
tion for TCA becomes
J(x, W, t) = I
t
(s) = I
_
s
1
, . . . , s
m
_
_
(u,v)t
I
_
s
u
, s
v
_
, (8)
where s factorizes in a tree t.
In TCA as in ICA, the density p(x) is not known and
the estimation criteria have to be substituted by empiri-
cal contrast functions. As described in [21], we will em-
ploy three types of contrast functions: (i) approximation of
the entropiesbeing part of (8) via kernel density estimation
(KDE), (ii) approximation of the mutual information based
on kernel generalized variance (KGV), and (iii) approxima-
tion based on cumulants using Gram-Charlier expansions
(CUM).
2.4. Topographical independent component analysis
The topographic independent component analysis [20] rep-
resents a unifying model which combines topographic map-
ping with ICA.
Achieved by a slight modication of the ICA model, it
can at the same time be used to dene a topographic order
between the components and thus has the usual computa-
tional advantages associated with topographic maps.
The paradigm of topographic ICA has its roots in [25]
where a combination of invariant feature subspaces [26] and
independent subspaces [22] is proposed. In the following, we
will describe these two parts, which substantially reect the
concept of topographic ICA [27].
2.4.1. Invariant feature subspaces
The principle of invariant feature subspaces was developed
by Kohonen [26] with the intention of representing features
with some invariances. This principle states that an invariant
feature is given by a linear subspace in a feature space. The
value of the invariant feature is given by the squared norm of
the projection of the given data point on that subspace.
A feature subspace can be described by a set of orthogo-
nal basis vectors w
j
, j = 1, . . . , n, where n is the dimension of
the subspace. Then the value G(x) of the feature G with the
input vector x is given by
G(x) =
n
_
j=1
_
w
j
, x
_
2
. (9)
In other words, this describes the distance between the input
vector x and a general linear combination of the basis vectors
w
j
of the feature subspace [26].
2.4.2. Independent subspaces
Traditional ICA works under the assumption that the ob-
served signals x
i
(t) (i = 1, . . . , n) are generated by a lin-
ear weighting of a set of n statistically independent random
sources s
j
(t) with time-independent coecients a
i j
. In a ma-
trix form, this can be expressed as
x(t) = As(t), (10)
where x(t) = [x
1
(t), . . . , x
n
(t)]
T
, s(t) = [s
1
(t), . . . , s
n
(t)], and
A = [a
i j
].
In multidimensional ICA [22], the sources s
i
are not as-
sumed to be all mutually independent. Instead, it is assumed
that they can be grouped in n-tuples, such that within these
tuples they are dependent on each other, but are indepen-
dent outside. This newly introduced assumption was ob-
served in several image processing applications. Each n-tuple
of sources s
i
corresponds to n basis vectors given by the rows
of matrix A. A subspace spanned by a set of n such basis
vectors is dened as an independent subspace. In [22] two
simplifying assumptions are made: (1) although s
i
are not at
all independent, they are chosen to be uncorrelated and of
unit variance, and (2) the data are preprocessed by whiten-
ing (sphering) them. This means the w
j
are orthonormal.
Let J be the number of independent feature subspaces
and S
j
, j = 1, . . . , J, the set of indices that belong to the sub-
space of index j. Assume that we have T given observations
x(t), t = 1, . . . , T. Then the likelihood L of the data based on
the model is given by
L
_
w
i
, i = 1, . . . , n
_
=
T
t=1
_
| det W|
J
j=1
p
j
__
w
i
, x(t)
_
, i S
j
_
_
(11)
with p
j
() being the probability density inside the jth n-tuple
of s
i
. The expression | det W| is due to the linear transforma-
tion of the pdf. As always with ICA, p
j
() need not be known
in advance.
2.4.3. Fusion of invariant feature and
independent subspaces
In [25] it is shown that a fusion between the concepts of in-
variant and independent subspaces can be achieved by con-
sidering probability distributions for the n-tuples of s
i
be-
ing spherically symmetric, that is, depending on the norm.
In other words, the pdf p
j
() has to be expressed as a func-
tion of the sum of the squares of the s
i
, i S
j
, only. Ad-
ditionally, it is assumed that the pdfs are equal for all sub-
spaces.
The log likelihood of this new data model is given by
log L
_
w
i
, i = 1, . . . , n
_
=
T
_
t=1
J
_
j=1
log p
_
_
iSj
_
w
i
, x(t)
_
2
_
+ T log | det W|.
(12)
p(
iSj
s
2
i
) = p
j
(s
i
, i S
j
) gives the pdf inside the jth n-
tuple of s
i
. Based on the prewhitening, we have log | det W| =
0.
For computational simplication, set
G
_
_
iSj
s
2
i
_
= log p
_
_
iSj
_
w
i
, x(t)
_
2
_
. (13)
Since it is known that the projection of visual data on any
subspace has a super-Gaussian distribution, the pdf has to be
chosen to be sparse. Thus, we will choose G(u) =
u +
yielding a multidimensional version of an exponential distri-
bution. and are constants and enforce that s
i
is of unit
variance.
u
3
u
2
u
1
s
3
s
2
s
1
1
A
x
3
x
2
x
1
Figure 2: Topographic ICA model [20]. The variance-generating
variables u
i
are randomly generated and mixed linearly within their
topographic neighborhoods. This forms the input to nonlinearity ,
thus giving the local variance
i
. Components s
i
are generated with
variances
i
. The observed variables x
i
are obtained as with standard
ICA from the linear mixture of the components s
i
.
2.4.4. The topographic ICAarchitecture
Based on the concepts introduced in the preliminary subsec-
tions, this section describes the topographic ICA.
To introduce a topographic representation in the ICA
model, it is necessary to relax the assumption of indepen-
dence among neighboring components s
i
. This makes it nec-
essary to adopt an idea from self-organized neural networks,
that of a lattice. It was shown in [20] that a representa-
tion which models topographic correlation of energies is an
adequate approach for introducing dependencies between
neighboring components.
In other words, the variances corresponding to neigh-
boring components are positively correlated while the other
variances are, in a broad sense, independent. The architec-
ture of this new approach is shown in Figure 2.
This idea leads to the following representation of the
source signals:
s
i
=
i
z
i
, (14)
where z
i
is a random variable having the same distribution as
s
i
, and the variance
i
is xed to unity.
The variance
i
is further modeled by a nonlinearity:
i
=
_
n
_
k=1
h(i, k)u
k
_
, (15)
where u
i
are the higher-order independent components used
to generate the variances, while describes some nonlinear-
ity. The neighborhood function h(i, k) can either be a two-
dimensional grid or have a ring-like structure. Further u
i
and
z
i
are all mutually independent.
The learning rule is based on the maximization of the
likelihood. First, it is assumed that the data are preprocessed
by whitening and that the estimates of the components are
uncorrelated. The log likelihood is given by
log L
_
w
i
, i = 1, . . . , n
_
=
T
_
t=1
n
_
j=1
G
_
n
_
i=1
_
w
T
i
x(t)
2
_
_
+ T log | det W|.
(16)
T
o
p
o
I
C
A
F
a
s
t
I
C
A
J
A
D
E
T
D
S
E
P
P
C
A
T
C
A
C
U
M
T
C
A
K
G
V
T
C
A
K
D
E
0.7
0.8
0.9
1
A
r
e
a
(a)
T
o
p
o
I
C
A
F
a
s
t
I
C
A
J
A
D
E
T
D
S
E
P
P
C
A
T
C
A
C
U
M
T
C
A
K
G
V
T
C
A
K
D
E
0.7
0.8
0.9
1
A
r
e
a
(b)
T
o
p
o
I
C
A
F
a
s
t
I
C
A
J
A
D
E
T
D
S
E
P
P
C
A
T
C
A
C
U
M
T
C
A
K
G
V
0.7
0.8
0.9
1
A
r
e
a
(c)
Figure 3: Results of the comparison between tree-dependent ICA, topographic ICA, Jade, FastICA, TDSEP, and PCA on fMRI data. Spatial
accuracy of ICA maps is assessed by ROC analysis using correlation map with a chosen threshold of 0.4. The number of chosen independent
components (ICs) for all techniques is N = 8 in (a), N = 9 in (b), and N = 16 in (c).
The update rule for the weight vector w
i
is derived from
a gradient algorithm based on the log likelihood assuming
log | det W| = 0:
w
i
E
_
x
_
w
T
i
x
_
r
i
_
, (17)
where
r
i
=
n
_
k=1
h(i, k)g
_
n
_
j=1
h(k, j)
_
w
T
j
x
_
2
_
. (18)
The function g is the derivative of G =
1
u +
1
. Af-
ter every iteration, the vectors w
i
in (17) are normalized to
unit variance and orthogonalized. This equation represents a
modulated learning rule, where the learning term is modu-
lated by the term r
i
.
The classic ICA results from the topographic ICA by set-
ting h(i, j) =
i j
.
3. RESULTS ANDDISCUSSION
fMRI data were recorded from six subjects (3 female, 3
male, age 2037) performing a visual task. In ve subjects,
ve slices with 100 images (TR/TE = 3000/60 msec) were
acquired with ve periods of rest and ve photic simula-
tion periods with rest. Simulation and rest periods com-
prised 10 repetitions each, that is, 30 seconds. Resolution
was 3 3 4 mm. The slices were oriented parallel to the
calcarine ssure. Photic stimulation was performed using
an 8 Hz alternating checkerboard stimulus with a central
xation point and a dark background with a central xa-
tion point during the control periods [17]. The rst scans
were discarded for remaining saturation eects. Motion arti-
facts were compensated by automatic image alignment (AIR,
[28]).
The clustering results were evaluated by (1) task-related
activation maps, (2) associated time courses, and (3) ROC
curves.
3.1. Estimation of the ICAmodel
To decide to what extent spatial ICA of fMRI time series de-
pends on the employed algorithm, we have rst to look at the
optimal number of principal components selected by PCA
and used in the ICA decomposition. ICA is a generalization
of PCA. In case no ICA is performed, then the number of in-
dependent components equals zero, and this means there is
no PCA decomposition performed.
In the following we will give the set parameters. For
PCA, no parameters had to be set. For FastICA we choose
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 4: Cluster assignment maps for cluster analysis based on the tree-dependent ICA (CUM) of a visual stimulation fMRI experiment
obtained for 16 ICs.
(1) = 10
6
, (2) 10
5
as the maximal number of iterations,
and (3) the nonlinearity g(u) = tanhu. And last, for topo-
graphic ICA we set the following: (1) stop criterium is full-
lled if the synaptic weights dierence between two consecu-
tive iterations is less than 10
5
number of ICs, (2) the func-
tion g(u) = u, and (3) 10
4
is the maximal number of itera-
tions.
It is signicant to nd a xed number of ICs that can
theoretically predict new observations in same conditions,
assuming the basic ICA model actually holds. To do so, we
compared the six proposed algorithms for 8, 9, and 16 com-
ponents in terms of ROC analysis using a correlation map
with a chosen threshold of 0.4. The obtained results are plot-
ted in Figure 3. It can be seen that topographic ICA outper-
forms all other ICA methods for 8 and 9 ICs. However, for
16 ICs topographic ICA is outperformed by tree-dependent
ICA (KGV) using as an approximation of the mutual infor-
mation the kernel generalized variance.
The clustering results for the two methods, the tree-
dependent (CUMand KGV) and topographic ICAare shown
cc : 0.08
(a)
cc : 0.05
(b)
cc : 0.19
(c)
cc : 0.02
(d)
cc : 0.08
(e)
cc : 0.03
(f)
cc : 0.04
(g)
cc : 0.05
(h)
cc : 0.21
(i)
cc : 0.07
(j)
cc : 0.05
(k)
cc : 0.20
(l)
cc : 0.23
(m)
cc : 0.00
(n)
cc : 0.09
(o)
cc : 0.92
(p)
Figure 5: Associated codebook vectors for the tree-dependent ICA (CUM) as shown in Figure 4. Assignment of the codebook vectors
corresponds to the order of the assignment maps shown in Figure 4.
in Figures 49. Figures 4, 6, and 8 illustrate the so-called as-
signment maps where all the pixels belonging to a specic
cluster are highlighted. The assignment between a pixel and
a specic cluster is given by the minimum distance between
the pixel and an IC from the established codebook. On the
other hand, each IC shown in Figures 5, 7, and 9 can be
viewed as the cluster-specic weighted average of all pixel
time courses.
We immediately can see a topographical representation
in Figure 9 by looking at the last row (ICs 15 and 16): the
two time courses s with the highest absolute-valued cor-
relation are grouped together. Thus, the advantage of the
tree-dependent ICA (KGV) becomes immediately evident: it
groups together signals according to their dependence con-
tent. This eect cannot be observed neither with topographic
nor with tree-dependent ICA (CUM).
3.2. Characterization of task-related effects
For all subjects, and runs, unique task-related activation
maps and associated time courses were obtained by the
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 6: Cluster assignment maps for cluster analysis based on the topographic ICA of a visual stimulation fMRI experiment obtained for
16 ICs.
tree-dependent and topographic ICA techniques. The
correlation of the component time course most closely
associated with the visual task for these two techniques is
shown in Table 1 for IC = 8, 9, and 16. This time course can
serve as an estimate of the stimulus reference function used
in the fMRI experiment, as identied by the specic depen-
dent component technique. FromTable 1, we see for the tree-
dependent ICA (CUM) a continuous increase for the corre-
lation coecient while for the topographic ICA this correla-
tion coecient decreases for IC = 16 and for tree-dependent
ICA (KGV) it decreases even for IC = 9.
3.3. Exploratory analysis of ancillary ndings
From Figures 49, we can also obtain some insight in the
type of artifactual components. For the cluster assignment
maps in Figure 4, cluster 12 and cluster 16 in Figure 6 may
be assigned to a coactivation of the frontal eye elds induced
by stimulus onset. No such ndings can be reported from
Figure 8. There may be some type of physiological related-
ness between cluster 12 on one hand, and between cluster 16
showing high correlation with the stimulus function, on the
other hand in Figure 4. The same is valid for cluster 16 and
cluster 8 in Figure 6. Interestingly, Figure 8 determines two
cc : 0.02
(a)
cc : 0.10
(b)
cc : 0.07
(c)
cc : 0.02
(d)
cc : 0.05
(e)
cc : 0.06
(f)
cc : 0.09
(g)
cc : 0.086
(h)
cc : 0.14
(i)
cc : 0.05
(j)
cc : 0.12
(k)
cc : 0.12
(l)
cc : 0.11
(m)
cc : 0.26
(n)
cc : 0.01
(o)
cc : 0.18
(p)
Figure 7: Associated codebook vectors for the topographic ICA as shown in Figure 6. Assignment of the codebook vectors corresponds to
the order of the assignment maps shown in Figure 6.
ICs showing a high correlation with the stimulus function.
However, this connection is not revealed by the feature space
metric and thus is not supported by clustering approaches
based on this metric.
An additional benet from unsupervised clustering tech-
niques represents the ability to identify data highly indicative
of artifacts, for example, ventricular pulsation or through
plane motion. Cluster 6 in Figure 4 and cluster 3 in Figure 6,
for example, show the region of the inner ventricles. It is im-
portant to mention that these eects could not have been de-
tected by model-based approaches.
4. CONCLUSION
In the present paper, we have experimentally compared four
standard ICA algorithms already adopted in the fMRI liter-
ature with two new algorithms, the tree-dependent and to-
pographic ICA. The goal of the paper was to determine the
robustness and reliability of extracting task-related activation
maps and time courses from fMRI data sets. The success of
ICA methods is based on the condition that the spatial dis-
tribution of brain areas activated by task performance must
be spatially independent of the distributions of areas aected
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 8: Cluster assignment maps for cluster analysis based on the tree-dependent ICA (KGV) of a visual stimulation fMRI experiment
obtained for 16 ICs.
by artifacts. The obtained results proved to reveal extremely
well the structure of the data set.
It can be seen that topographic ICA outperforms all
other ICA methods for 8 and 9 ICs. However, for 16 ICs
topographic ICA is outperformed by tree-dependent ICA
(KGV) using as an approximation of the mutual informa-
tion the kernel generalized variance. All dependent compo-
nent techniques can be employed to identify interesting an-
cillary ndings that cannot be detected by model-based ap-
proaches. The applicability of the new algorithms is demon-
strated on experimental data. We conjecture that the method
can serve as a multipurpose exploratory data analysis strategy
to image time-series analysis and provide good visualiza-
tion for many elds ranging from biomedical basic research
to clinical assessment of patient data. In particular, beyond
the application to fMRI data analysis discussed in this pa-
per, the method exhibits a specic potential to serve in ap-
plications refering to dynamic contrast-enhanced perfusion
MRI for the diagnosis of cerebrovascular disease or mag-
netic resonance mammography for the analysis of suspi-
cious lesions in patients with breast cancer. In addition,
it could yield a visualization of large trees using a hyper-
bolic space by employing a hyperbolic self-organized map
[29].
cc : 0.16
(a)
cc : 0.08
(b)
cc : 0.31
(c)
cc : 0.11
(d)
cc : 0.00
(e)
cc : 0.02
(f)
cc : 0.00
(g)
cc : 0.04
(h)
cc : 0.22
(i)
cc : 0.15
(j)
cc : 0.01
(k)
cc : 0.04
(l)
cc : 0.24
(m)
cc : 0.19
(n)
cc : 0.82
(o)
cc : 0.66
(p)
Figure 9: Associated codebook vectors for the tree-dependent ICA (KGV) as shown in Figure 8. Assignment of the codebook vectors corre-
sponds to the order of the assignment maps shown in Figure 8.
Table 1: Comparison of the correlations of the component time course most closely associated with the visual task for tree-dependent (tree
ICA) and topographic ICA (topo ICA) for IC = 8, 9, and 16.
No. of ICs Tree ICA (KDE) Tree ICA (KGV) Tree ICA (CUM) Topo ICA
IC = 8 0.78 0.74 0.78 0.85
IC = 9 0.79 0.66 0.91 0.87
IC = 16 0.82 0.92 0.86
ACKNOWLEDGMENTS
The authors would like to thank Dr. Dorothee Auer from the
Max Planck Institute of Psychiatry in Munich, Germany, for
providing the fMRI data. We are grateful for the nancial
support of the Humboldt Foundation.
REFERENCES
[1] P. A. Bandettini, E. C. Wong, R. S. Hinks, R. S. Tikofsky, and
J. S. Hyde, Time course EPI of human brain function dur-
ing task activation, Magnetic Resonance in Medicine, vol. 25,
no. 2, pp. 390397, 1992.
[2] J. Frahm, K. D. Merboldt, and W. Hanicke, Functional MRI
of human brain activation at high spatial resolution, Mag-
netic Resonance in Medicine, vol. 29, no. 1, pp. 139144, 1993.
[3] K. Kwong, Functional magnetic-resonance-imaging with
echo-planar imaging, Magnetic Resonance Quarterly, vol. 11,
no. 1, pp. 120, 1995.
[4] K. Kwong, J. Belliveau, D. Chesler, et al., Dynamic magnetic
resonance imaging of human brain activity during primary
sensor stimulation, Proceedings of the National Academy of
Science, vol. 89, no. 12, pp. 56755679, 1992.
[5] S. Ogawa, D. Tank, R. Menon, et al., Intrinsic signal changes
accompanying sensory stimulation: functional brain mapping
with magnetic resonance imaging, Proceedings of the National
Academy of Science, vol. 89, no. 13, pp. 59515955, 1992.
[6] J. Boxerman, P. A. Bandettini, K. Kwong, et al., The intravas-
cular contribution to FMRI signal change: Monte Carlo mod-
eling and diusion-weighted studies in vivo, Magnetic Reso-
nance in Medicine, vol. 34, no. 1, pp. 410, 1995.
[7] S. Ogawa, T. Lee, and B. Barrere, The sensitivity of mag-
netic resonance image signals of a rat brain to changes in the
cerebral venous blood oxygenation activation, Magnetic Res-
onance in Medicine, vol. 29, no. 2, pp. 205210, 1993.
[8] J. J. Sychra, P. A. Bandettini, N. Bhattacharya, and Q. Lin,
Synthetic images by subspace transforms I. Principal com-
ponents images and related lters, Medical Physics, vol. 21,
no. 2, pp. 193201, 1994.
[9] W. Backfrieder, R. Baumgartner, M. S amal, E. Moser, and
H. Bergmann, Quantication of intensity variations in
functional MR images using rotated principal components,
Physics in Medicine and Biology, vol. 41, no. 8, pp. 14251438,
1996.
[10] M. J. McKeown, T.-P. Jung, S. Makeig, et al., Spatially in-
dependent activity patterns in functional MRI data during
the stroop color-naming task, Proceedings of the National
Academy of Sciences, vol. 95, no. 3, pp. 803810, 1998.
[11] M. J. McKeown, S. Makeig, G. G. Brown, et al., Analysis of
fMRI data by blind separation into independent spatial com-
ponents, Human Brain Mapping, vol. 6, no. 3, pp. 160188,
1998.
[12] F. Esposito, E. Formisano, E. Seifritz, et al., Spatial indepen-
dent component analysis of functional MRI time-series: To
what extent do results depend on the algorithm used? Hu-
man Brain Mapping, vol. 16, no. 3, pp. 146157, 2002.
[13] K. Arfanakis, D. Cordes, V. M. Haughton, C. H. Moritz, M.
A. Quigley, and M. E. Meyerand, Combining independent
component analysis and correlation analysis to probe inter-
regional connectivity in fMRI task activation datasets, Mag-
netic Resonance Imaging, vol. 18, no. 8, pp. 921930, 2000.
[14] G. Scarth, M. McIntyre, B. Wowk, and R. Somorjai, Detec-
tion of novelty in functional images using fuzzy clustering,
in Proc. 3rd Scientic Meeting of the International Society for
Magnetic Resonance in Medicine, vol. 95, pp. 238242, Nice,
France, August 1995.
[15] K.-H. Chuang, M.-J. Chiu, C.-C. Lin, and J.-H. Chen,
Model-free functional MRI analysis using Kohonen cluster-
ing neural network and fuzzy C-means, IEEE Trans. Med.
Imag., vol. 18, no. 12, pp. 11171128, 1999.
[16] R. Baumgartner, L. Ryner, W. Richter, R. Summers, M. Jar-
masz, and R. Somorjai, Comparison of two exploratory data
analysis methods for fMRI: fuzzy clustering vs. principal com-
ponent analysis, Magnetic Resonance Imaging, vol. 18, no. 1,
pp. 8994, 2000.
[17] A. Wism uller, O. Lange, D. R. Dersch, et al., Cluster analy-
sis of biomedical image time-series, International Journal of
Computer Vision, vol. 46, no. 2, pp. 103128, 2002.
[18] H. Fischer and J. Hennig, Clustering of functional MR data,
in Proc. 4th Annual Meeting of the International Society for
Magnetic Resonance in Medicine (ISMRM96), pp. 11791183,
New York, NY, USA, April 1996.
[19] S. C. Ngan and X. Hu, Analysis of functional magnetic reso-
nance imaging data using self-organizing mapping with spa-
tial connectivity, Magnetic Resonance in Medicine, vol. 41,
no. 5, pp. 939946, 1999.
[20] A. Hyv arinen, P. Hoyer, and M. Inki, Topographic indepen-
dent component analysis, Neural Computation, vol. 13, no. 7,
pp. 15271558, 2001.
[21] F. R. Bach and M. I. Jordan, Beyond independent compo-
nents: trees and clusters, Journal of Machine Learning Re-
search, vol. 4, pp. 12051233, December 2003.
[22] J.-F. Cardoso, Multidimensional independent component
analysis, in Proc. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP 98), vol. 4, pp. 1941
1944, Seattle, Wash, USA, May 1998.
[23] A. Hyv arinen and E. Oja, Independent component analysis:
algorithms and applications, Neural Networks, vol. 13, no. 4-
5, pp. 411430, 2000.
[24] C. R. Chow and C. N. Liu, Approximating discrete probabil-
ity distributions with dependence trees, IEEE Trans. Inform.
Theory, vol. 14, no. 3, pp. 462467, 1968.
[25] A. Hyv arinen and P. Hoyer, Emergence of phase- and shift-
invariant features by decomposition of natural images into
independent feature subspaces, Neural Computation, vol. 12,
no. 7, pp. 17051720, 2000.
[26] T. Kohonen, Emergence of invariant-feature detectors in the
adaptive-subspace self-organizing map, Biological Cybernet-
ics, vol. 75, no. 4, pp. 281291, 1996.
[27] A. Meyer-B ase, Pattern Recognition for Medical Imaging, Aca-
demic Press, Boston, Mass, USA, 2003.
[28] R. P. Woods, S. R. Cherry, and J. C. Mazziotta, Rapid auto-
mated algorithm for aligning and reslicing PET images, Jour-
nal of Computer Assisted Tomography, vol. 16, no. 4, pp. 620
633, 1992.
[29] H. Ritter, Self-organizing maps in non-euclidean spaces, in
Kohonen Maps, pp. 97108, Springer, Berlin, Germany, 1999.
Anke Meyer-B ase is with the Department
of Electrical and Computer Engineering at
the Florida State University. Her research
areas include theory and application of
neural networks, medical image process-
ing, pattern recognition, and parallel pro-
cessing. She was awarded the Lise-Meitner-
Prize in 1997. She published over 100 pa-
pers in several areas including intelligent
systems, medical image processing, speech
recognition, and neural networks. She is author of the book
Pattern Recognition in Medical Imaging which appeared in Else-
vier/Academic Press in 2003.
Monica K. Hurdal is an Assistant Profes-
sor of Biomedical Mathematics at Florida
State University in Tallahassee, Florida. She
was awarded her Ph.D. degree in 1999
from Queensland University of Technology,
Australia, in applied mathematics. Subse-
quently, Dr. Hurdal was a Postdoctoral Re-
search Associate for two years at Florida
State University (FSU) in mathematics and
also computer science, working on confor-
mal at mapping of the human brain. She continued her research
at Johns Hopkins University in the Center for Imaging Science as
a Research Scientist, followed by her current position in 2001 in
Biomedical Mathematics at FSU. Her research interests include ap-
plying topology, geometry, and conformal methods to the analysis
and modeling of neuroscientic data from the human brain. She is
investigating topology issues associated with constructing cortical
surfaces from MRI data, computing conformal maps of the brain,
and applying topological and conformal invariants to characterize
disease in MRI studies.
Oliver Lange studied information tech-
nologies engineering at the TU in Munich.
After nishing his diploma in 1999, he was
a Ph.D. student in biomedical engineering
at the Institute of Clinical Radiology at the
University of Munich. When he nished his
Ph.D. in 2003, Oliver Lange was a Consul-
tant for the Department of Engineering at
Florida State University. Since July 2004, he
has been working as a Research Engineer in
the eld of biomedical signal processing.
Helge Ritter studied physics and mathe-
matics at the Universities of Bayreuth, Hei-
delberg, and Munich. After a Ph.D. degree
in physics at the Technical University of
Munich in 1988, he visited the Laboratory
of Computer Science at Helsinki Univer-
sity of Technology and the Beckman In-
stitute for Advanced Science and Technol-
ogy at the University of Illinois at Urbana-
Champaign. Since 1990 he has been the
Head of the Neuroinformatics Group at the Faculty of Technol-
ogy, Bielefeld University. His main interests are principles of neural
computation and their application to build intelligent systems. In
1999, Helge Ritter was awarded the SEL Alcatel Research Prize and
in 2001 the Leibniz Prize of the German Research Foundation DFG.
Robust EEGChannel Selection across Subjects
for Brain-Computer Interfaces
Michael Schr oder,
1
Thomas Navin Lal,
2
Thilo Hinterberger,
3
Martin Bogdan,
1
N. Jeremy Hill,
2
Niels Birbaumer,
3
Wolfgang Rosenstiel,
1
and Bernhard Sch olkopf
2
1
Department of Computer Engineering, Eberhard-Karls University T ubingen, Sand 13, 72076 T ubingen, Germany
Emails: schroedm@informatik.uni-tuebingen.de, bogdan@informatik.uni-tuebingen.de,
rosenstiel@informatik.uni-tuebingen.de
2
Max Planck Institute for Biological Cybernetics, Spemannstrasse 38, 72076 T ubingen, Germany
Emails: navin@tuebingen.mpg.de, jez@tuebingen.mpg.de, bs@tuebingen.mpg.de
3
Institute of Medical Psychology and Behavioral Neurobiology, Eberhard-Karls University T ubingen, Gartenstrasse 29,
72074 T ubingen, Germany
Emails: thilo.hinterberger@uni-tuebingen.de, niels.birbaumer@uni-tuebingen.de
Received 11 February 2004; Revised 22 September 2004
Most EEG-based brain-computer interface (BCI) paradigms come along with specic electrode positions, for example, for a visual-
based BCI, electrode positions close to the primary visual cortex are used. For new BCI paradigms it is usually not known where
task relevant activity can be measured fromthe scalp. For individual subjects, Lal et al. in 2004 showed that recording positions can
be found without the use of prior knowledge about the paradigm used. However it remains unclear to what extent their method
of recursive channel elimination (RCE) can be generalized across subjects. In this paper we transfer channel rankings from a group
of subjects to a new subject. For motor imagery tasks the results are promising, although cross-subject channel selection does not
quite achieve the performance of channel selection on data of single subjects. Although the RCE method was not provided with
prior knowledge about the mental task, channels that are well known to be important (from a physiological point of view) were
consistently selected whereas task-irrelevant channels were reliably disregarded.
Keywords and phrases: brain-computer interface, channel selection, feature selection, recursive channel elimination, support
vector machine, electroencephalography.
1. INTRODUCTION
Brain-computer interface (BCI) systems are designed to dis-
tinguish two or more mental states during the performance
of mental tasks (e.g., motor imagery tasks). Many BCI sys-
tems for humans try to classify those states on the basis of
electroencephalographic (EEG) signals using machine learn-
ing algorithms.
The input for classication methods is a set of training
examples. In the case of BCI one example might consist of
EEG data (possibly containing several channels) of one trial
and a label marking the class of the trial. Classication meth-
ods pursue the objective to nd structure in the data and as a
result provide a mapping from EEG data to mental states.
For some tasks the relevant EEG recording positions that
lead to good classication results are known, especially when
the tasks involve motor imagery (e.g., the imagination of
limb movements) or the overall activity of large parts of the
cortex (so-called slow cortical potentials, SCP) that occurs
during intentions or states of preparation and relaxation.
For the development of new paradigms the neural cor-
relates might not be known in detail and nding optimal
recording positions for the use in BCIs is challenging. Such
new paradigms can become necessary in cases when mo-
tor cortex areas show lesions, for the increase of the in-
formation rate of BCI systems, or for robust multiclass
BCIs.
Algorithms for channel selection (CS) can identify suit-
able recording sites for individual subjects even in the ab-
sence of prior knowledge about the mental task. In this case
it is possible to reduce the number of EEG electrodes neces-
sary for the classication of brain signals without losing sub-
stantial classication performance.
In addition the CS results
1
can help to understand which
part of the brain generates the class-relevant activity and even
1
If an ordered list of channels is given by the CS algorithmthat represents
the importance of each channel for classication, this result is also called a
ranking.
A
B
C
D
E
0 10 20 30 40
Best n remaining channels
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
T
e
s
t
e
r
r
o
r
Average RFE
Average motor 17
0 10 20 30 40
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
T
e
s
t
e
r
r
o
r
Figure 1: Test error of the channel selection method RCE for ve subjects (Ato E) on 39 EEGchannels. The left graph shows the development
of the test error against the best n remaining channels determined by RCE. For some subjects, the test error can be decreased by selecting
fewer than 39 channels. The right graph shows test error of RCE averaged over the ve subjects. On average, good performance can be
obtained by less than 10 channels. The average test error for a set of 17 EEG channels over or close to motor cortex is added as a baseline for
comparison.
simplies the detection of artifact channels.
2
In [2], dierent
channel selection algorithms have been compared for a mo-
tor imagery task. Figure 1 shows an example of the change
in classication error that is observed applying the winning
method recursive channel elimination (RCE) to the data of
ve individuals.
If data from several subjects are available, the questions
arise, whether a set of channels selected for one subject is
useful also for other subjects and whether generalized con-
clusions can be drawn about channels relevant for the classi-
cation of a certain mental task across subjects.
The paper is organized as follows. Section 2 contains the
experimental setup, a description of the mental task, and the
basic data preprocessing. In Section 3 the channel selection
method and the classication algorithm are described. Re-
sults of cross-subject channel selection compared to average
individual channel selection are given in Section 4 while the
nal section concludes.
2. DATA ACQUISITION
2.1. Experimental setup and mental task
We recorded EEG signals from eight untrained right-
handed male subjects using 39 silver chloride electrodes
2
Some subjects unintentionally use muscle activity that inuences the
recorded signals when trained in a BCI system, especially if feedback is pro-
vided.
(see Figure 2). The reference electrodes were positioned at
TP9 and TP10. The two electrodes Fp2 and 1 cmlateral of the
right eye (EOG) were used to record possible EOG artifacts
and eye blinks while two frontotemporal and two occipital
electrodes were positioned to detect possible muscle activity
during the experiment. Before sampling the data at 256 Hz
an analog bandpass lter with cuto frequencies 0.1 Hz and
40 Hz was applied.
The subjects were seated in an armchair at 1 m distance
in front of a computer screen. Following the experimental
setup of [3] the subjects were asked to imagine left versus
right hand movements during each trial. With every subject,
we recorded 400 trials during one single session. The total
length of each trial was 9 seconds. Additional intertrial inter-
vals for relaxation varied randomly between 2 and 4 seconds.
No outlier detection was performed and no trials were re-
moved during the data processing at any stage.
Each trial started with a blank screen. A small xation
cross was displayed in the center of the screen from second
2 to 9. A cue in the form of a small arrow pointing to the
right or left side was visible for half a second starting with
second 3. In order to avoid event-related signals in later pro-
cessing stages only data from seconds 4 to 9 of each trial were
considered for further analysis. Feedback was not provided
at any time.
2.2. Preanalysis
As Pfurtscheller and da Silva have reported [4], movement-
related desynchronization of the -rhythm (812 Hz) is not
Robust Channel Selection across Subjects 3105
equally strong in subjects and might even fail for some sub-
jects due to various reasons (e.g., because of too short in-
tertrial intervals that prevent a proper resynchronization).
Therefore we performed a pre-analysis in order to identify
and exclude subjects that did not show signicant -activity
at all.
For seven of the eight subjects, the -band was only
slightly dierent from the 812 Hz usually given in the EEG
literature. Only one subject showed scarcely any activity in
this frequency range but instead a recognizable movement-
related desynchronization in the 1620 Hz band.
Restricted to only the 17 EEG channels that were located
over or close to the motor cortex, we calculated the maxi-
mum energy of the -band using the Welch method [5] for
each subject. This feature extraction resulted in one param-
eter per trial and channel and explicitly incorporated prior
knowledge about the task.
The eight datasets consisting of the Welch-features were
classied with linear SVMs (see below) including individ-
ual model selection for each subject. Generalization errors
were estimated by 10-fold cross-validation. For three subjects
the pre-analysis showed very poor error rates close to chance
level, and their datasets were excluded from further analysis.
2.3. Data preprocessing
For the remaining ve subjects the 5 s windows recorded
from each trial resulted in a time series of 1280 sample points
per channel. We tted an autoregressive (AR) model of or-
der 3 to the time series
3
of all 39 channels using forward-
backward linear prediction [6]. The three resulting AR coef-
cients per channel and trial formed the new representation
of the data.
The extraction of the features did not explicitly incorpo-
rate prior knowledge although autoregressive models have
successfully been used for motor-related tasks (e.g., [3]).
However, they are not directly linked to the -rhythm.
Before AR, datasets from several subjects were combined
for cross-subject channel selection, an additional centering
and linear scaling of the data was performed. This was done
individually for each subject and trial in order to maintain
the proportion of corresponding AR coecients in a trial.
2.4. Notation
Let n denote the number of training vectors (trials) of the
datasets (n = 400 for each of the ve datasets) and let
d denote the data dimension (d = 3 39 = 117 for all
ve datasets). The training data for a classier is denoted as
X = (x
(1)
, . . . , x
(n)
) R
nd
with labels Y = (y
1
, . . . , y
n
)
{1, 1}
n
. For the task used in this paper y = 1 denotes
imagined left hand movement and y = 1 denotes imagined
3
For comparison reasons this choice of the model order is the same as
in [2]. For this work dierent model orders had been compared in the fol-
lowing way. For a given order we tted an AR-model to each EEG sequence.
After proper model selection a support vector machine with 10-fold cross-
validation (CV) was trained on the AR coecients. Model order 3 resulted
in the best mean CV error.
P9
TP9
FT9
F9 F10
EOG
FT10
TP10
P10
O1
PO7
P7
TP7
T7
FT7
F7
AF7
Fp1
Fpz
Fp2
AF8
F8
FT8
T8
TP8
P8
PO8
O2
P5
CP5
C5
FC5
F5 F6
FC6
C6
CP6
P6
PO3
POz
P3
CP3
C3
FC3
F3
AF3
AFz
AF4
F4
FC4
C4
CP4
P4
PO4
P1
CP1
C1
FC1
F1 Fz
FCz
Cz
CPz
Pz P2
CP2
C2
FC2
F2
PO9 Oz
O9
PO10
O10
Iz
Figure 2: The positions of 39 EEG electrodes used for data acquisi-
tion are marked by black circles. The two referencing electrodes are
marked by dotted circles. Eight electrodes over or close to the mo-
tor cortex are shown in bold circles (positions C1, C2, C3, C4, FC3,
FC4, CP3, and CP4).
right hand movement. The terms dimension and feature are
used synonymously.
3. CHANNEL SELECTIONAND
CLASSIFICATIONMETHODS
Channel selection algorithms as well as feature selection al-
gorithms can be characterized as either lter or wrapper
methods [7]. They select or omit dimensions of the data that
correspond to one EEGchannel depending on a performance
measure.
The problem of how to rate the relevance of a chan-
nel if nonlinear interactions between channels are present
is not trivial, especially since the overall accuracy might not
be monotonic in the number of features used. Some meth-
ods try to overcome this problem by optimizing the selection
for feature subsets of xed sizes (plus-l take-away-r search)
or by implementing oating strategies (e.g., oating forward
search) [7]. Only few algorithms like genetic algorithms can
choose subgroups of arbitrary size during the selection pro-
cess. They have successfully been used for the selection of
spatial features [8] in BCI applications but are computation-
ally demanding.
For the application of EEG channel selection, it is nec-
essary to treat certain groups of features homogenously: nu-
merical values belonging to one and the same EEG channel
have to be dealt with in a congeneric way so that a spatial
interpretation of the solution becomes possible.
In [2] three state-of-the-art algorithms were compared
for the problem of channel selection in BCI. As the method
of recursive channel elimination (RCE), which is closely re-
lated to support vector machines (SVM), performed superior
compared to other methods, we will use RCE for the cross-
subject channel selection experiments described in this pa-
per.
i0
.
i1
.
x
(i0)
x
(i1)
H
Figure 3: Linear SVM. For nonseparable datasets, slack variables
i
are introduced. The bold points on the dashed lines are called
support vectors (SVs). The solution for the hyperplane H can be
written in terms of the SVs. For more detail see Section 3.1.
3.1. Support vector machines
The support vector machine is a relatively new classication
technique developed by Vapnik [9] which has shown to per-
form strongly in a number of real-world problems, including
BCI [10]. The central idea is to separate data X R
d
from
two classes by nding a weight vector w R
d
and an oset
b R of a hyperplane
H : R
d
{1, 1},
x sign (w x + b)
(1)
with the largest possible margin,
4
which apart from being an
intuitive idea has been shown to provide theoretical guar-
antees in terms of generalization ability [9]. One variant of
the algorithm consists of solving the following optimization
problem:
min
wR
d
w
2
2
+ C
n
i=1
2
i
s.t. y
i
w x
(i)
+ b
1
i
(i = 1, . . . , n).
(2)
The parameters
i
are called slack variables and ensure that
the problem has a solution in case the data are not linear sep-
arable
5
(see Figure 3). The margin is dened as (X, Y, C) =
1/w
2
. In practice one has to trade o between a low train-
ing error, for example,

2
i
, and a large margin . This trade
o is controlled by the regularization parameter C. Finding
a good value for C is part of the model selection procedure.
If no prior knowledge is available C has to be estimated from
the training data, for example, by using cross-validation. The
value 2/C is also referred to as the ridge. For a detailed dis-
cussion please refer to [11].
4
If X is linear separable the margin of a hyperplane is proportional to the
distance of the hyperplane to the closest point x X.
5
If the data are linear separable the slack variables can improve the gen-
eralization ability of the solutions.
3.2. Recursive channel elimination
This channel selection method is derived from the recursive
feature elimination method prosed by Guyon et al. [12]. It
is based on the concept of margin maximization. The impor-
tance of a channel is determined by the inuence it has on the
margin of a trained SVM. Let W be the inverse of the margin
W(X, Y, C) :=
1
(X, Y, C)
= w
2
. (3)
Let X
j
be the data with features j removed and Y
j
the
corresponding labels. In the original version one SVM is
trained during each iteration and the features

j which mini-
mize |W(X, Y, C) W(X
j
, Y
j
, C)| are removed (typically,
i.e., one feature only); this is equivalent to removing the di-
mensions

j that correspond to the smallest |w
j
|. For channel
selection this method was adapted in the following way.
Let F
k
{1, . . . , d} denote the features from chan-
nel k. For each channel k we dene the score s
k
:=
(1/|F
k
|)
lFk
|w
l
|. At each iteration we remove the channels
with the lowest score. If no prior knowledge is available the
parameter C has to be estimated from the training data.
3.3. Generalization error estimation
For model selection purposes we estimated the generaliza-
tion error of classiers via 10-fold cross-validation.
If the generalization error of a channel selection method
had to be estimated, a somewhat more elaborated proce-
dure was used. An illustration of this procedure is given in
Figure 4.
The whole dataset is split up into 10 folds (F1 to F10) as
for usual cross-validation. In each fold F, the channel selec-
tion (CS in Figure 4) is performed based on the training set of
F only, leading to a specic ranking of the 39 EEG channels.
For each fold F, 39 classiers C
h
F
, h = 1, . . . , 39, are trained as
follows: C
h
F
is trained on the h best
6
channels, respectively, of
the train set of F and tested on the corresponding channels
of the test set of F. For each fold, this results in 39 test errors
(E
1
F
to E
39
F
).
During the last step, the corresponding test errors are av-
eraged over all folds. This leads to an estimate of the general-
ization error for every number of selected channels.
4. EXPERIMENTS ANDRESULTS
The successful transfer of EEG channel rankings of one sub-
ject to another can be dicult for several reasons.
(i) The head shapes might vary between subjects. This
limits the comparability of electrode positions and
channel selection outcomes.
(ii) Subjects might use dierent mental representations for
a task, even if they are instructed carefully.
6
In this context, best means according to the ranking calculated for that
fold.
Average over 10 folds:
E
1
E
39
E
1
F10
E
39
F10
F10
Test
F10
C
1
F10
C
39
F10
10 folds
.
.
.
.
.
.
.
.
.
.
.
.
E
1
F2
E
39
F2
.
.
.
Data
F2
Test
F2
C
1
F2
C
39
F2
Train
F2
CS
1 2

39
E
1
F1
E
39
F1
F1
Test
F1
Train
F1
CS
C
1
F1
C
39
F1
Calculate 39 test
errors for fold F1
1 2

39 for h = 1 : 39
train classier C
h
on h best channels
end
Ranking
of channels
Figure 4: Illustration of the procedure for channel selection and
error estimation using cross-validation.
(iii) Cortex areas important for the mental task are prob-
ably organized slightly dierently between subjects.
This limits the comparability of localized activity pat-
terns.
Luckily motor imagery tasks involve a comparably big
part of the cortex. As a result small dislocations of EEG elec-
trodes (e.g., around typical motor positions C3 and C4, see
Section 2) usually do not lead to profound error increase for
the classication of brain activity.
Nevertheless it is very important to investigate the reli-
ability of cross-subject channel selection: on the one hand,
even a slightly increased classication error leads to a large
drop in the information rate for a BCI system [13]; on the
other hand, mental tasks that do not show the advantages of
motor imagery will more and more be focused on by BCI re-
search in order to expand existing systems to multiclass BCIs
or for increasing the information rate of patients whose mo-
tor areas are not intact.
The following subsections show results for the recur-
sive channel elimination method on cross-subject data. In
Section 4.1 RCE is applied to combined data of all ve sub-
jects. Results are compared with the individual channel rank-
ings obtained from the ve subjects. In Section 4.2 the trans-
fer of rankings is investigated: RCE calculates rankings of
data combined from 4 subjects before these rankings are
tested on the corresponding remaining unseen dataset of the
last subject.
4.1. Channel selection on combined data
We applied the channel selection method of recursive chan-
nel elimination (RCE) introduced in Section 3 on a training
dataset that was combined from the ve AR datasets.
The estimation of the average generalization error for all
39 stages of the channel selection process with RCE was car-
ried out using linear SVMs as classiers with parameter C
previously determined by 10-fold cross-validation.
7
Details
about the 10-fold cross-validation process for channel selec-
tion are described in Section 3.3 and Figure 4. Figure 5 shows
the development of the estimated classication error for all
39 steps of the RCE.
For this combined dataset the test error was minimal
(26.9%) when using data from 32 or more EEG channels
but further reduction down to 24 channels increased the test
error only marginally. Reducing the number of channels to
fewer than the best 17 channels leads to a strong increase of
the test error.
Throughout the ranking in the table of Figure 5, artifact
or task-irrelevant channels appear only in the last ranks (e.g.,
EOG, occipital channels, FT9, FT10, etc.). Direct compari-
son between Figures 1 and 5 reveal that the curve in Figure 1
shows smaller error rates. The performance of a classier
trained on the RCE channels of combined data is worse than
the average performance of classiers trained on the individ-
ual RCE channels of single subject data.
4.2. Transfer of channel selection
outcomes to newsubjects
In this section we analyze whether there exists a general
good subgroup of EEG channels (i.e., a subgroup of chan-
nels that perform well for all subjects) for a xed mental
task and whether this subgroup can be determined by the
RCE method. We describe dierent methods to obtain chan-
nel rankings, some of which include the data of more than
one subject. However these rankings are always tested on the
data of one subject only. Table 1 provides an overview over
all ranking modes.
Cross-subject modes
We iterate the following process. One subject is removed
from the combined data base. We perform the RCE on the
remaining data which leads to a channel ranking.
We use this ranking in two dierent ways to obtain test
errors via 10-fold cross-validation on the data of the removed
subject.
(i) Best 8 (cross). The channel subset used for testing con-
sists of the eight best-ranked channels. The resulting 8 best
channels are plotted in Figure 6.
(ii) Best n (cross). The channel subset used for testing
consists of the n best-ranked channels. The number n is cho-
sen such that the expected cross-validation error on the four
7
Estimating the parameter for each number of channels in the process of
channel selection might improve the accuracy but was not performed.
0 5 10 15 20 25 30 35 40
0.25
0.3
0.35
0.4
0.45
0.5
T
e
s
t
e
r
r
o
r
Rank Position Rank Position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
CP2
CP1
FC2
FCz
F1
C4
FC4
C2
F2
C1
C3
CPz
FC3
FT7
FC1
C6
CP4
P6
C5
O2
POz
F6
AFz
TP8
Cz
P1
CP3
P2
FT9
P5
FT10
TP7
FT8
Fp2
F5
O1
O9
EOG
O10
Figure 5: RCE results for a combined dataset of all 5 subjects. The graph shows a test error estimation for the n best channels. The error
values were estimated by 10-fold crossvalidation. The table on the right shows the channel ranking performed on the combined data. Eight
channels which are located over or close to the motor cortex (see Figure 2) are printed with grey background. The surface map visualizes this
ranking. The 24 best-ranked electrodes were mapped to grey scale values. Bright areas of the surface map correspond to relevant channels
(according to RCE) whereas dark areas show less-relevant electrodes.
Table 1: Ranking modes overview: explanation of the ranking modes used for the comparison shown in Figure 7. The rankings were calcu-
lated on dierent kinds of datasets: on data from single subjects or (for cross-subject tests) on combined datasets (4-fold cross-validation).
Testing of the ranking modes was always performed on the data of one single subject.
Mode Ranking method Ranking based on Description
Motor 8 A priori knowledge Single subject 8 channels over or close to motor cortex
Random 8 (Random) Single subject 8 channels
Best n (single) RCE Single subject
n channels
with highest rank that minimize CV error
Best 8 (single) RCE Single subject
8 channels
with highest rank
Best n (cross) RCE Four subjects
n channels
with highest rank that minimize CV error
Best 8 (cross) RCE Four subjects
8 channels
with highest rank
subjects is minimized. Note that this choice does not depend
on the data of the fth test subject.
As this process is repeated for every subject that was left
out, we can average the error values of the modes Best 8
(cross) and Best n (cross) over ve repetitions.
For comparison: single-subject modes
For the xed mental task of motor activity and imagery, the
EEG literature suggests the channels CP3, CP4, and adja-
cent electrodes (e.g., [3]). Our guess at generally good sub-
group of EEG channels is thus the electrode set: FC3, FC4,
C1, C2, C3, C4, CP3, CP4 (see electrodes marked in boldface
in Figure 2). The corresponding test mode is referred to as
Motor 8.
If no prior knowledge of a task and no channel selection
were available, a randomchoice of channels would be the sin-
gle solution. For comparison reasons we include the mode
Random 8. Its test error is the average of ten repetitions of
choosing eight random channels, optimizing the regulariza-
tion parameter C and testing this random subset via 10-fold
cross-validation on the data of one subject.
For the two modes Best 8 (single) and Best n (single) the
RCE method was applied to the individual data of single
subjects only. These modes used subgroups of the eight best
channels and n best channels (see above) for calculating the
test error via 10-fold cross-validation. It can be expected that
the ranking for data from single subjects leads to more accu-
rate classication results and can reveal task-related artifact
Without subject A
P9
TP9
FT9
F9 F10
EOG
FT10
TP10
P10
O1
PO7
P7
TP7
T7
FT7
F7
AF7
Fp1
Fpz
Fp2
AF8
F8
FT8
T8
TP8
P8
PO8
O2
P5
CP5
C5
FC5
F5 F6
FC6
C6
CP6
P6
PO3
POz
P3
CP3
C3
FC3
F3
AF3
AFz
AF4
F4
FC4
C4
CP4
P4
PO4
P1
CP1
C1
FC1
F1 Fz
FCz
Cz
CPz
Pz P2
CP2
C2
FC2
F2
Without subject B
P9
TP9
FT9
F9 F10
EOG
FT10
TP10
P10
O1
PO7
P7
TP7
T7
FT7
F7
AF7
Fp1
Fpz
Fp2
AF8
F8
FT8
T8
TP8
P8
PO8
O2
P5
CP5
C5
FC5
F5 F6
FC6
C6
CP6
P6
PO3
POz
P3
CP3
C3
FC3
F3
AF3
AFz
AF4
F4
FC4
C4
CP4
P4
PO4
P1
CP1
C1
FC1
F1 Fz
FCz
Cz
CPz
Pz P2
CP2
C2
FC2
F2
Without subject C
P9
TP9
FT9
F9 F10
EOG
FT10
TP10
P10
O1
PO7
P7
TP7
T7
FT7
F7
AF7
Fp1
Fpz
Fp2
AF8
F8
FT8
T8
TP8
P8
PO8
O2
P5
CP5
C5
FC5
F5 F6
FC6
C6
CP6
P6
PO3
POz
P3
CP3
C3
FC3
F3
AF3
AFz
AF4
F4
FC4
C4
CP4
P4
PO4
P1
CP1
C1
FC1
F1 Fz
FCz
Cz
CPz
Pz P2
CP2
C2
FC2
F2
Without subject D
P9
TP9
FT9
F9 F10
EOG
FT10
TP10
P10
O1
PO7
P7
TP7
T7
FT7
F7
AF7
Fp1
Fpz
Fp2
AF8
F8
FT8
T8
TP8
P8
PO8
O2
P5
CP5
C5
FC5
F5 F6
FC6
C6
CP6
P6
PO3
POz
P3
CP3
C3
FC3
F3
AF3
AFz
AF4
F4
FC4
C4
CP4
P4
PO4
P1
CP1
C1
FC1
F1 Fz
FCz
Cz
CPz
Pz P2
CP2
C2
FC2
F2
Without subject E
P9
TP9
FT9
F9 F10
EOG
FT10
TP10
P10
O1
PO7
P7
TP7
T7
FT7
F7
AF7
Fp1
Fpz
Fp2
AF8
F8
FT8
T8
TP8
P8
PO8
O2
P5
CP5
C5
FC5
F5 F6
FC6
C6
CP6
P6
PO3
POz
P3
CP3
C3
FC3
F3
AF3
AFz
AF4
F4
FC4
C4
CP4
P4
PO4
P1
CP1
C1
FC1
F1 Fz
FCz
Cz
CPz
Pz P2
CP2
C2
FC2
F2
With all subjects
P9
TP9
FT9
F9 F10
EOG
FT10
TP10
P10
O1
PO7
P7
TP7
T7
FT7
F7
AF7
Fp1
Fpz
Fp2
AF8
F8
FT8
T8
TP8
P8
PO8
O2
P5
CP5
C5
FC5
F5 F6
FC6
C6
CP6
P6
PO3
POz
P3
CP3
C3
FC3
F3
AF3
AFz
AF4
F4
FC4
C4
CP4
P4
PO4
P1
CP1
C1
FC1
F1 Fz
FCz
Cz
CPz
Pz P2
CP2
C2
FC2
F2
Figure 6: The database consists of data from 5 subjects. The channels were ranked 5 times using the channel selection method recursive
channel elimination (RCE), each time using the data of four subjects only. The electrode positions marked in bold are the 8 best-ranked ones
and are consistently located over or close to the motor cortex although the method was not provided with prior knowledge about the motor
imagery task. This type of ranking is referred to as Best 8 (cross).
Motor 8
Random8
Best n(single)
Best 8 (single)
Best n(cross)
Best 8 (cross)
A B C D E Average
Test subject
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
T
e
s
t
e
r
r
o
r
Figure 7: Comparison of the test errors of six dierent ranking
modes for single subjects (A to E) and the test errors of these modes
averaged over the ve subjects (Average). For each mode and sub-
ject, the regularization parameter C was estimated separately. All
test errors were obtained using 10-fold CV. The rst mode Motor
8 tests the classication error for 8 channels over or close to the
motor cortex, whereas Random 8 is based on 8 randomly chosen
channels. Modes Best n (single) and Best 8 (single) test channel sets
whose rankings were calculated based on the specic subject only.
Modes Best n (cross) and Best 8 (cross) test channel sets whose rank-
ings were calculated based on all other subjects data but did not
incorporate data from the test subject.
channels [2] that might not be present in data from other
subjects.
Figure 7 shows the results for the 6 modes. The right-
most block contains an average taken over subjects for each
of the modes. From the average results we observe the fol-
lowing.
(i) The 8 motor channels are not optimal: Best 8 (single)
performs much better.
8
(ii) Mode Best 8 (cross) performs almost as well as the
motor channel mode. Although we conclude that the RCE
method fails to nd an optimal channel subset, the results
suggest that when transferring channel positions across sub-
jects the expected performance is not much worse than the
one using prior knowledge.
(iii) The subset of 8 random channels performs surpris-
ingly well. This nding suggests that the structure of the data
can successfully be captured by the SVM even if only few
channels close to the motor cortex are contained in the chan-
nel subset. However all other modes show better error esti-
mations.
8
In Figure 1 the choice of motor channels results in a lower classication
error than the error from the RCE method. This is due to the fact that the
regularization parameter C or ridge was not optimized for a specic ranking
as was done in this study.
(iv) The performance of Best n (cross) mode is compara-
ble to the results of the Best 8 (single) mode (23%); never-
theless this comparison is unfair since on average 27 chan-
nels were used. The cross-validation averaged over the ve
subjects is 26% for the choice of 27 random channels (not
plotted in Figure 7).
(v) The best performing mode is Best n (single). On aver-
age it only uses n = 14 channels and yields an error as low as
21.8%.
5. CONCLUSION
The recursive channel elimination (RCE) method was ap-
plied to EEG channel selection in the context of signal classi-
cation for a Brain-Computer interface (BCI) system.
All experiments were based on data from ve subjects
recorded during a motor imagery task comprising imagined
left and right hand movement.
For individual subject we analyzed the performance of
three dierent types of rankings: (i) ranking including chan-
nels over the motor cortex only, (ii) ranking obtained by RCE
from the data of that subject, (iii) ranking obtained by RCE
from the data of the other four subjects.
We obtained best results with RCE rankings from sin-
gle subjects. A comparison reveals that they outperform mo-
tor rankings (including prior knowledge about the task) by
about 5% absolute error.
The transfer of RCE rankings from the data of multiple
subjects to a new subject leads to a small decrease in perfor-
mance. The dierence to the performance of motor rankings
turns out to be less than 2% on average.
We conclude that individual channel ranking is prefer-
able over cross-subject ranking for the experimental
paradigm investigated here.
However for the rst time, it could be shown that RCE
cannot only successfully be used to select channels for indi-
vidual subjects, but that RCE rankings on the combined data
of multiple subjects are consistently in agreement with the
EEG literature on motor imagery tasks, and can still yield er-
ror rates as low as 17% on unseen subjects.
ACKNOWLEDGMENTS
The authors would like to thank Bernd Battes and Professor
Dr. Kuno Kirschfeld for their help with the EEG recordings.
Special thanks to Dr. Jason Weston for his help on feature
selection topics. This work have been supported in part by
DFG (AUMEX RO 1030/12), NIH, and the IST Programme
of the European Community, under the PASCAL Network
of Excellence, IST-2002-506778. Thomas Navin Lal was sup-
ported by a grant from the Studienstiftung des deutschen
Volkes.
REFERENCES
[1] E. E. Sutter, The brain response interface: communication
through visually-induced electrical brain responses, Jour-
nal of Microcomputer Applications, vol. 15, no. 1, pp. 3145,
1992.
[2] T. N. Lal, M. Schr oder, T. Hinterberger, et al., Support vector
channel selection in BCI, IEEE Trans. Biomed. Engineering,
vol. 51, no. 6, pp. 10031010, 2004.
[3] G. Pfurtscheller, C. Neuper, A. Schlogl, and K. Lugger, Sep-
arability of EEG signals recorded during right and left mo-
tor imagery using adaptive autoregressive parameters, IEEE
Trans. Rehab. Eng., vol. 6, no. 3, pp. 316325, 1998.
[4] G. Pfurtscheller and F. H. Lopes da Silva, Event-related
EEG/MEG synchronization and desynchronization: basic
principles, Clinical Neurophysiology, vol. 110, no. 11, pp.
18421857, 1999.
[5] P. D. Welch, The use of fast Fourier transform for the esti-
mation of power spectra: a method based on time averaging
over short, modied periodograms, IEEE Trans. Audio Elec-
troacoust., vol. 15, no. 2, pp. 7073, 1967.
[6] S. Haykin, Adaptive Filter Theory, Prentice-Hall, Upper Saddle
River, NJ, USA, 1996.
[7] P. Pudil, F. J. Ferri, J. Novovicova, and J. Kittler, Floating
search methods for feature selection with nonmonotonic cri-
terion functions, in Proc. 12th International Conference on
Pattern Recognition (ICPR 94), vol. 2, pp. 279283, Jerusalem,
Israel, October 1994.
[8] M. Schr oder, M. Bogdan, W. Rosenstiel, T. Hinterberger, and
N. Birbaumer, Automated EEG feature selection for brain
computer interfaces, in Proc. 1st International IEEE EMBS
Conference on Neural Engineering, pp. 626629, Capri, Italy,
March 2003.
[9] V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons,
New York, NY, USA, 1998.
[10] B. Blankertz, G. Curio, and K. M uller, Classifying single trial
EEG: towards brain computer interfacing, in Advances in
Neural Information Processing Systems, T. K. Leen, T. G. Di-
etterich, and V. Tresp, Eds., vol. 14, MIT Press, Cambridge,
Mass, USA, 2001.
[11] B. Sch olkopf and A. Smola, Learning with Kernels, MIT Press,
Cambridge, Mass, USA, 2002.
[12] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene selec-
tion for cancer classication using support vector machines,
Machine Learning, vol. 46, no. 1-3, pp. 389422, 2002.
[13] A. Schl ogl, C. Keinrath, R. Scherer, and G. Pfurtscheller, In-
formation transfer of an EEG-based brain-computer inter-
face, in Proc. 1st International IEEE EMBS Conference on Neu-
ral Engineering, pp. 641644, Capri, Italy, March 2003.
Michael Schr oder received his Diploma in
computer science in 2000. Currently he is a
Ph.D. student at the Department for Com-
puter Engineering (Professor Rosenstiel) at
the Eberhard-Karls-Universit at T ubingen in
Germany. His research interests include ma-
chine learning, brain-computer interface
systems, and signal processing.
Thomas Navin Lal received his Diploma
in mathematics in 2001 and spent one
year with the machine learning group of
Professor Dr. Thomas Hofmann at Brown
University, Providence, RI. He is currently
a Ph.D. student of Professor Dr. Bern-
hard Sch olkopf at the Max Planck Institute
for Biological Cybernetics, T ubingen, Ger-
many. He is a researcher at the PASCAL
network of excellence and is currently sup-
ported by a grant from the Studienstiftung des deutschen Volkes.
Thilo Hinterberger received his Diploma in
physics from the University of Ulm, Ger-
many, and received his Ph.D. degree in
physics from the University of T ubingen,
Germany, in 1999, on the development of a
brain-computer interface, called Thought
Translation Device. He is currently a Re-
search Associate with the Institute of Med-
ical Psychology and Behavioral Neurobiol-
ogy at the University of T ubingen, Ger-
many. His primary research interests focus on the further devel-
opment of brain-computer interfaces and their applications and
also on the development of EEG classication methods and the
investigation of neuropsychological mechanisms during the oper-
ation of a BCI using functional MRI. He is a Member of the Soci-
ety of Psychophysiological Research and the Deutsche Physikalische
Gesellschaft (DPG).
Martin Bogdan received the Engineer
Diploma in signal engineering from the
Fachhochschule Oenburg, Germany, in
1993, and the Engineer Diploma in in-
dustrial informatics and instrumentation
from the Universit` e Joseph Fourier Greno-
ble, France, in 1993. In 1998, he received
the Ph.D. degree in computer science (com-
puter engineering) from the University of
T ubingen, Germany. In 1994, he joined
the Department of Computer Engineering at the University of
T ubingen, where, since 2000, he has headed the research group
NeuroTeam. This research group deals mainly with signal process-
ing based on articial neural nets and machine learning focused on,
but not limited to, biomedical applications.
N. Jeremy Hill graduated in experimen-
tal psychology at the University of Oxford,
UK, in 1995. Until 2001 he was a Research
Assistant, Programmer, and nally a doc-
toral student in the psychophysics labora-
tory of Dr. Bruce Henning in Oxford. He re-
ceived the Ph.D. degree in 2002, for a doc-
toral thesis on psychophysical statistics en-
titled Testing hypotheses about psychome-
tric functions. Since then he has been part
of Professor Bernhard Sch olkopf s Department for Empirical Infer-
ence for Machine Learning and Perception at the Max Planck Insti-
tute for Biological Cybernetics in T ubingen, Germany, and now he
focuses on brain-computer interface research.
Niels Birbaumer was born in 1945. He re-
ceived his Ph.D. degree in 1969, in bio-
logical psychology, art history, and statis-
tics, from the University of Vienna, Aus-
tria. From 1975 to 1993, he was a Full
Professor of clinical and physiological psy-
chology, University of T ubingen, Germany.
From 1986 to 1988, he was a Full Profes-
sor of psychology, Pennsylvania State Uni-
versity, USA. Since 1993, he has been a Pro-
fessor of medical psychology and behavioral neurobiology at the
Faculty of Medicine, the University of T ubingen, and Professor of
clinical psychophysiology, University of Padova, Italy. Since 2002,
he has been the Director of the Center of Cognitive Neuroscience,
University of Trento, Italy.
Wolfgang Rosenstiel is Professor at the
University of T ubingen and is the Chair of
Computer Engineering. He is also the Man-
aging Director of the Wilhelm Schickard In-
stitute at T ubingen University, and the Di-
rector of the Department for System Design
in Microelectronics at the Computer Sci-
ence Research Centre (FZI). He is on the Ex-
ecutive Board of the German Edacentrum.
His research areas include articial neural
networks, signal processing, embedded systems, and computer ar-
chitecture.
Bernhard Sch olkopf received an M.S. de-
gree in mathematics (University of London,
1992) and a Diploma in physics (Eberhard-
Karls-Universit at T ubingen, 1994), and a
Ph.D. degree in computer science (Techni-
cal University Berlin, 1997). He won the Li-
onel Cooper Memorial Prize of the Uni-
versity of London, the Annual Dissertation
Prize of the German Association for Com-
puter Science (GI), and the Prize for the
Best Scientic Project at the German National Research Center for
Computer Science (GMD). He has researched at AT&T Bell Labs,
at GMD FIRST, Berlin, at the Australian National University, Can-
berra, and at Microsoft Research Cambridge, UK. He has taught
at the Humboldt University and the Technical University Berlin. In
July 2001, he was elected Scientic Member of the Max Planck Soci-
ety and Director at the MPI for Biological Cybernetics. In October
2002, he was appointed Honorary Professor for Machine Learning
at the Technical University Berlin.
Determining Patterns in Neural Activity for Reaching
Movements Using Nonnegative Matrix Factorization
Sung-Phil Kim
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA
Email: phil@cnel.u.edu
Yadunandana N. Rao
Motorola Inc., FL, USA
Email: yadu@cnel.u.edu
Deniz Erdogmus
Department of Computer Science and Biomedical Engineering, Oregon Health & Science University,
Beaverton, OR 97006, USA
Email: derdogmus@ieee.org
Justin C. Sanchez
Department of Pediatrics, Division of Neurology, University of Florida, Gainesville, FL 32611, USA
Email: justin@cnel.u.edu
Miguel A. L. Nicolelis
Department of Neurobiology, Center for Neuroengineering, Duke University, Durham, NC 27710, USA
Emails: derdogmus@ieee.org; nicoleli@neuro.duke.edu
Jose C. Principe
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA
Email: principe@cnel.u.edu
Received 31 January 2004; Revised 23 March 2005
We propose the use of nonnegative matrix factorization (NMF) as a model-independent methodology to analyze neural activity.
We demonstrate that, using this technique, it is possible to identify local spatiotemporal patterns of neural activity in the form
of sparse basis vectors. In addition, the sparseness of these bases can help infer correlations between cortical ring patterns and
behavior. We demonstrate the utility of this approach using neural recordings collected in a brain-machine interface (BMI) setting.
The results indicate that, using the NMF analysis, it is possible to improve the performance of BMI models through appropriate
pruning of inputs.
Keywords and phrases: brain-machine interfaces, nonnegative matrix factorization, spatiotemporal patterns, neural ring activ-
ity.
1. INTRODUCTION
Brain-machine interfaces (BMIs) are an emerging eld that
aims at directly transferring the subjects intent of move-
ment to an external machine. Our goal is to engineer de-
vices that are able to interpret neural activity originating in
the motor cortex and generate accurate predictions of hand
position. In the BMI experimental paradigm, hundreds of
microelectrodes are implanted in the premotor, motor, and
posterior parietal areas and the corresponding neural activ-
ity is recorded synchronously with behavior (hand reach-
ing and grasping movements). Spike detection and sorting
algorithms are used to determine the ring times of sin-
gle neurons. Typically, the spike-time information is sum-
marized into bin counts using short windows (100 millisec-
onds in this paper). A number of laboratories including our
own have demonstrated that linear and nonlinear adaptive
system identication approaches using the bin count input
can lead to BMIs that eectively predict the hand position
and grasping force of primates for dierent movement tasks
[1, 2, 3, 4, 5, 6, 7, 8]. The adaptive methods studied thus far
include moving average models, time-delay neural networks
(TDNNs), Kalman lter and extensions, recursive multilayer
perceptrons (RMLPs), and mixture of linear experts gated by
hidden Markov models (HMMs).
BMIs open up an important avenue to study the spatio-
temporal organization of spike trains and their relationships
with behavior. Recently, our laboratory has investigated the
sensitivity of neurons and cortical areas based on their role
in the mapping learned by the RMLP and the Wiener l-
ter [7]. We examined how each neuron contributes to the
output of the models, and found consistent relationships be-
tween cortical regions and segments of the hand trajectory
in a reaching movement. This analysis indicated that, dur-
ing each reaching action, specic neurons from the posterior
parietal, the premotor dorsal, and the primary motor regions
sequentially became dominant in controlling the output of
the models. However, this approach relies on determining a
suitable model, because it explicitly uses the learned model
to infer the dependencies.
In this paper, we propose a model-independent method-
ology to study spatiotemporal patterns between neuronal
spikes and behavior utilizing nonnegative matrix factoriza-
tion (NMF) [9, 10]. In its original applications, NMF was
mainly used to provide an alternative method for determin-
ing sparse representations of images to improve recognition
performance [10, 11]. d Avella and Tresch have also pro-
posed an extension of NMF to extract time-varying muscle
synergies for the analysis of behavior patterns of a frog [12].
The nonnegativity constraints in NMF result in the unsuper-
vised selection of sparse bases that can be linearly combined
(encoded) to reconstruct the original data. Our hypothesis is
that NMF can similarly yield sparse bases for analyzing neu-
ral ring activity, because of the intrinsic nonnegativity of
the bin counts and the sparseness of spike trains.
The application of NMF to extract local features of neu-
ral spike counts follows the method of obtaining sparse bases
to describe the local features of face images. The basis vec-
tors provided by NMF and their temporal encoding patterns
are examined to determine how the activities of specic neu-
rons localize to each segment of the reaching trajectory. We
will show that the results from this model-independent anal-
ysis of the neuronal activity are consistent with the previous
observations from the model-based analysis.
2. NONNEGATIVE MATRIX FACTORIZATION
NMF is a procedure to decompose a nonnegative data matrix
into the product of two nonnegative matrices: bases and en-
coding coecients. The nonnegativity constraint leads to a
parts-based representation, since only additive, not subtrac-
tive, combinations of the bases are allowed. An n m non-
negative data matrix X, where each column is a sample vec-
tor, can be approximated by NMF as
X = WH+ E, (1)
where E is the error and W and H have dimensions n r
and r m, respectively. W consists of a set of r basis vectors,
while each column of H contains the encoding coecients
for every basis for the corresponding sample. The number of
bases is selected to satisfy r(n + m) < nm so that the number
of equations exceed that of the unknowns.
This factorization can be described in terms of columns
as
x
j
Wh
j
, for j = 1, . . . , m, (2)
where x
j
is the jth column of X and h
j
is the jth column of
H. Thus, each sample vector is a linear combination of ba-
sis vectors in Wweighted by h
j
. The nonnegative constraints
on Wand Hallow only additive combination of basis vectors
to approximate x
j
. This constraint allows the visualization of
the basis vectors as part of the original sample [10]. This is
contrary to factorization by PCA, where negative basis vec-
tors are allowed.
The decomposition of X into W and H can be deter-
mined by optimizing an error function between the original
data matrix and the decomposition. Two possible cost func-
tions used in the literature are the Frobenius norm of the er-
ror matrix X WH
2
F
and the Kullback-Leibler divergence
D
KL
(XWH). The nonnegativity constraint can be satised
by using multiplicative update rules discussed in [10] to min-
imize these cost functions. In this paper, we will employ the
Frobenius norm measure, for which the multiplicative up-
date rules that converge to a local minimum are given below:
H
j
(k + 1) = H
j
(k)
W
T
X
W
T
WH
j
,
W
i
(k + 1) = W
i
(k)
XH
T
WHH
T
i
.
(3)
A
ab
denotes the element of a matrix A at ath row and bth
column. It has been proven in [9] that the Frobenius norm
cost function is nonincreasing under this update rule.
3. FACTORIZATIONOF THE NEURONAL
ACTIVITY MATRIX
We will now apply the multiplicative update rule in (2) to the
neuronal bin-count matrix (created by real neural recordings
of a behaving primate). The goal is to determine nonnegative
sparse bases for the neural activity, fromwhich we wish to de-
duce the local spatial structure of the neural population r-
ing activity. These bases also point out common population
ring patterns corresponding to the specic behavior. In ad-
dition, the resulting factorization yields a temporal encoding
matrix that indicates how the instantaneous neural activity is
optimally constructed from these localized representations.
Since we are interested in the relationship between the neu-
ral activity and behavior, we would like to study the coupling
between this temporal encoding pattern with the movement
of the primate, as well as the contribution of the specic bases
vectors, which represent neural populations.
Determination of Neural Firing Patterns Using NMF 3115
Table 1: Distribution of neurons and cortical regions.
Monkey-1 Monkey-2
Regions PP M1(area 1) PMd M1(area 2) M1 PMd
Neurons 1 33 34 54 55 81 82 104 1 37 38 54
3.1. Data preparation
Synchronous, multichannel neuronal spike trains were col-
lected at Duke University using two female owl monkeys
(Aotus trivirgatus): Belle (monkey-1) and Carmen (monkey-
2).
1
Microwire electrodes were implanted in cortical regions
where motor associations are known [1, 13]. During the
neural recording process, up to sixty-four electrodes were
implanted in posterior parietal (PP)-area 1, primary motor
(M1)-area 2, area 4, and premotor dorsal (PMd)-area 3, each
receiving sixteen electrodes. From each electrode, one to four
neurons can be discriminated. The ring times of individual
neurons were determined using spike detection and sorting
algorithms [14] and were recorded while the primate per-
formed a 3D reaching task that consists of a reach to food
followed by eating. The primates hand position was also
recorded using multiple ber optic sensors (with a shared
time clock) and digitized with a 200 Hz sampling rate [1].
These sensors were contained in the plastic strip of which
bending and twisting modied the transmission of the light
through the sensors in order to record positions in 3D space
more accurately. The neuronal ring times were binned in
nonoverlapping windows of 100 milliseconds, representing
the local ring rate for each neuron. In this recording session
of approximately 20 minutes (12 000 bins), 104 neurons for
monkey-1 and 54 neurons for monkey-2 could be discrimi-
nated (whose distribution to cortical regions is provided in
Table 1 from [13]), and there were 71 reaching actions for
monkey-1 and 65 for monkey-2, respectively. These reach-
ing movements consist of three natural segments shown in
Figure 1.
Based on the analysis of Wessberg et al. [1], the instanta-
neous movement is correlated with the current and the past
neural data up to 1 second (10 bins). Therefore, for each
time instant, we form a bin-count vector by concatenating
10 bins of ring counts (which correspond to 10-tap delay
line in a linear lter) from every neuron. Hence, if x
j
(i) rep-
resents the ith bin of neuron j, where i {1, . . . , 12 000}, a
bin-count vector at time instance i is represented by x(i) =
[x
1
(i), x
1
(i 1), . . . , x
1
(i 9), x
2
(i), . . . , x
n
(i 9)]
T
, where n is
the number of neurons. Since we are interested in determin-
ing repeated spatiotemporal ring patterns during the reach-
ing movements, only the bin counts from time instances
where the primates arm is moving are considered. There is
a possibility that in the selected training set some neurons
never re. The rows corresponding to these neurons must be
removed from the bin-count matrix, since they tend to cause
1
All experimental procedures conformed to the National Academy Press
Guide for the Care and Use of Laboratory Animals and were approved by the
Duke University Animal Care and Use Committee.
0 10 20 30 40 50 60 70
30
20
10
0
10
20
30
40
50
60
70
x
y
z
Time (ms)
P
o
s
i
t
i
o
n
(
m
m
)
Rest to food Food to mouth Mouth to rest
Figure 1: Segmentation of the reaching trajectories: reach from rest
to food, reach from food to mouth, and reach from mouth to rest
positions (taken from [7]).
instability in the NMF algorithm. In addition, to prevent the
error criterion from focusing too much on neurons that sim-
ply re frequently (although the temporal structure of their
activity might not be signicant for the task), the bin counts
in each row (i.e., for each neuron) of the data matrix are nor-
malized to have the unit length in its two norms. In general,
if n neurons are considered for a total of m time instances,
the data matrix X has dimension (10n) m. Since the entries
of the data matrix are bin counts, they are guaranteed to be
nonnegative. Accounting for 71 or 65 movements, there are
m = 2143 time instances for monkey-1 and m = 2521 for
monkey-2.
3.2. Analysis of factorization process
In the application of NMF to a given neural ring matrix,
there are several important issues that must be addressed: the
selection of the number of bases, the uniqueness of the NMF
solution, and understanding how NMF can nd local struc-
tures of neural ring activity.
The problem of the choice of the number of bases can
be addressed in the framework of model selection. A num-
ber of model selection techniques (e.g., the cross-validation)
can be utilized for nding the optimal number of bases.
In this paper, we choose to adopt a selection criterion that
has been recently developed for clustering. The criterion is
called the index I, which has been used to indicate the cluster
validity [15]. This index has shown consistent performance
of selecting the true number of clusters for various experi-
mental settings. The index I is composed of three factors as
I(r) =
1
r

E
1
E
r
D
r
p
, (4)
where E
r
is the approximation error (Frobenius norm) for r
bases, and D
r
is the maximum Euclidean distance between
bases such that
D
r
=
r
max
i, j=1
w
i
w
j
. (5)
The optimal r is the one that maximizes I(r). We will utilize
this index to determine the optimal r for NMF with p = 1.
Donoho and Stodden have shown that a unique solu-
tion of NMF is possible under certain conditions [16]. They
have shown through a geometrical interpretation of NMF
that if the data are not strictly positive, there can be only
one set of nonnegative bases which spans the data in the
positive orthant. With an articulated set of images obeying
three rules (a generative model, linear independence of gen-
erators, and factorial sampling), they showed NMF identi-
es the generators or parts of images. If we consider our
neuronal bin-count matrix, each row contains many zero en-
tries (zero bin counts) even after removing nonring neu-
rons since most neurons do not re continuously once in ev-
ery 100-millisecond window during the entire training set.
Therefore, our neuronal data are not strictly positive. This
implies that the existence of a unique set of nonnegative bases
for the neuronal bin-count matrix is warranted. The ques-
tion still remains if the NMF basis vectors can nd the gen-
erative ring patterns for the neural population by meeting
the three conditions mentioned above. Here, we discuss the
neuronal bin-count data with respect to these conditions.
As stated previously, we have demonstrated through sen-
sitivity analysis that the specic neuronal subsets from the
PP, PMd, and M1 regions were sequentially involved in de-
riving the output of the predictive models during reaching
movements [7]. Hence, the bin-count data for the reaching
movement will contain increasing ring activity of the spe-
cic neuronal subset on local partitions of the trajectory. Due
to binning, it is possible that more than one ring pattern is
associated with a single data sample. This analysis leads to
a generative model for the binned data in which data sam-
ples are generated by linear combination of the specic ring
patterns with nonnegative coecients. Also, these ring pat-
terns will be linearly independent since the neuronal subset
in each ring patterns tends to modulate ring rates only for
the local part of trajectory. The third condition of factorial
sampling can be approximately satised by the repetition of
movements in which the variability of a particular ring pat-
tern is observed during the entire data set. However, a more
rigorous analysis is necessary to support the argument that
the set of ring patterns is complete in factorial terms. There-
fore, we expect that the NMF solutions may be slightly vari-
able reecting the ambiguity in the completeness of factorial
sampling. This might be overcome by collecting more data
for reaching movements, and will be pursued in future stud-
ies.
3.3. Case studies
The NMF algorithmis applied to the described neuronal data
matrix prepared using ten taps, n = 91 neurons for monkey-
1 (after eliminating the neurons that do not re through
the entire training set) and n = 52 neurons for monkey-
2. The NMF algorithm with 100 independent runs results
in r=5 bases for both monkey-1 and monkey-2 datasets for
which the index I is maximized. The means and the stan-
dard deviations of the normalized cost (Frobenius norm of
error between approximation and the given data matrix di-
vided by the Frobenius norm of the data only) for 100 runs
are 0.8399 0.001 for monkey-1 data and 0.7348 0.002
for monkey-2 data. This implies that the algorithm approx-
imately converges to the same solution with dierent initial
conditions (although not sucient).
In Figure 2, we show the resulting basis vectors (columns
of W) for the bin counts (presented in matrix form where
columns are dierent neurons and rows are dierent delays),
as well as their corresponding time-varying encoding coe-
cients (rows of H) superimposed on the reaching trajectory
coordinates of three consecutive movements. Based on the
assumption that the neuronal bin-count data approximately
satisfy the three conditions for the identication of the gener-
ators, the NMF basis vectors determine the sequence of spa-
tiotemporal ring patterns representing the ring modula-
tion of the specic neuronal subsets during the course of the
reaching movement. Alternatively, we can say that NMF dis-
covers these latent ring patterns of neural population by op-
timal linear approximation of the data with fewbases [9]. For
example, from the two basis vectors each corresponding to
two primates in the left panel of Figure 2, we observe that r-
ings of the neurons in group-b are followed by rings of the
neurons in group-a (the bright activity denoted by b occurs
earlier in time than the activity denoted by a, since increas-
ing values in the vertical axis of each basis indicates going
further back in time). Thus, NMF eectively determines and
summarizes this sparse ring pattern that involves a group
of neurons ring sequentially. Their relative average activity
is also indicated by the relative magnitudes of the entries of
this particular basis.
Using these time-synchronized neural activity and hand
trajectory recordings, it is also possible to discover rela-
tionships between ring patterns and certain aspects of the
movement. We can assess the repeatability of a certain r-
ing pattern summarized by a basis vector by observing the
time-varying activity of the corresponding encoding signal
(the corresponding row of H) in time. An increase in this
coecient corresponds to a larger emphasis to that basis in
reconstructing the original neural activity data. In the right
panel of Figure 2, we observe that all bases are activated reg-
ularly in time by their corresponding encoding signals (at
dierent time instances and at dierent amplitudes). For ex-
ample, the rst basis for monkey-1 is periodically activated
to the same amplitude, whereas the activation amplitude of
Time
50 100
50 100
50 100
50 100
50 100
10
5
1
10
5
1
10
5
1
10
5
1
10
5
1
a
b
L
a
g
Neuron index
Time
(b)
25 50
25 50
25 50
25 50
25 50
10
5
1
10
5
1
10
5
1
10
5
1
10
5
1
a
b
L
a
g
Neuron index
(a)
Figure 2: (a) The ve bases for monkey-1 (top) and monkey-2 (bottom). (b) Their corresponding encoding signals (thick solid line) overlaid
on the 3-dimensional coordinates of the reaching trajectory (dotted lines) for three consecutive representative reaching tasks (separated by
the dashed lines). Note that the encoding signals are scaled to be in the same order of the magnitude of the reaching trajectory for the visual
purpose.
the third basis varies in every movement, which might in-
dicate a change in the role of the corresponding neuronal
ring pattern in executing that particular movement. The
periodic activation of encodings also indicates the bursting
nature of the spatiotemporal repetitive patterns. Hence, the
NMF bases tend to encode synchronous and bursting spa-
tiotemporal patterns of neural ring activity.
From the NMF decomposition, we observe certain asso-
ciations between the activities of neurons from dierent cor-
tical regions and dierent segments of the reaching trajec-
tory. In particular, an analysis of the monkey-1 data based on
Figure 2 indicates that neurons in PP and M1 (array 1) repeat
similar ring patterns during the reach from rest to food.
This assessment is based on the observation that bases three,
four, and ve, which involve ring activities from neurons
in these regions, are repeatedly activated by the increased
amplitude of their respective encoding coecients. Similarly,
neurons in M1 (array 2) are repeatedly activated during the
reach to and from the mouth (bases one and two). These
observations are consistent with our previous analyses that
were conducted through trained input-output models (such
as the Wiener lter and RMLP) [7]. Table 2 compares the
neurons, which were observed to have the highest sensitivity
from trained models, and the neurons that have the largest
magnitudes in each NMF basis. This comparison is based
on monkey-1 dataset. We can see that neurons from NMF
are a subset of neurons obtained from the sensitivity anal-
ysis. It is also worth stating that NMF basis provides more
information than the model-based sensitivity analysis since
it determines the synchronous spatiotemporal patterns while
Table 2: Comparison of important neurons (examined in the monkey-1 dataset).
Regions PP M1(area 1) PMd M1(area 2)
The high sensitive neurons through RMLP 4, 5, 7, 22, 26, 29 38, 45 None 93, 94
The largest-magnitude neurons in NMF bases 7, 29 45 None 93, 94
Table 3: Performance evaluation of the Wiener lter and the mixture of multiple models based on NMF.
CC (x) CC (y) CC (z) MSE (x) MSE (y) MSE (z)
Monkey-1
Wiener lter 0.5772 0.6712 0.7574 0.4855 0.3468 0.2460
NMF mixture 0.7147 0.7078 0.8076 0.2711 0.2786 0.1627
Monkey-2
Wiener lter 0.3737 0.4304 0.6192 0.3050 0.7405 0.2882
NMF mixture 0.4974 0.5041 0.6916 0.2354 0.5400 0.2112
the sensitivity analysis only determines individual important
neurons. Finally, we would like to reiterate that the analysis
presented here is solely based on the data, which means that
this analysis does not need to train a specic model to inves-
tigate the neural population organization.
3.4. Modeling improvement for BMI using NMF
We will demonstrate a simple example showing the improved
BMIs performance in predicting hand positions by utilizing
NMF. We will compare the performance of two systems; the
Wiener lter directly applied to the original spike count data
and the mixture of multiple linear lters based on the NMF
bases and encodings.
The straight Wiener lter is directly applied to the neural
ring data to estimate the three coordinates of the primates
hand position. The Wiener lter has been a standard model
for BMIs, and many other approaches have been compared
with it [19]. With nine delays, the input dimensionality of the
lter is 910 for monkey-1 or 510 for monkey-2 (discarding
inactive (no ring) neural channels). Then we add a bias to
each input vector to estimate the y-intercept. The weights of
the lter are estimated by the Wiener-Hopf equation as
W= R
1
P, (6)
where R is a 911 911 (or 511 511 for monkey-2) input
correlation matrix, and P is a 9113 (or 5113 for monkey-
2) input-output cross-correlation matrix.
The mixture of multiple models employs the NMF en-
codings as mixing coecients. An NMF basis is used as a
window function for the corresponding local model. There-
fore, each model sees a given input vector through a dierent
window and uses the windowed input vector to produce the
output. Then the NMF encodings are used to combine each
models output to produce the nal estimate of the desired
hand position vector. This can be described in the following
equation:
d
c
(n) =
K
k=1
h
k
(n)
z
k
(n)
T
g
k,c
+ b
k,c
, (7)
where h
k
(n) is an NMF encoding coecient for the kth basis
at nth column (i.e., time index), g
k,c
is the weight vector of
the kth model for the cth coordinate (c [x, y, z]), and b
k,c
is
the y-intercept of the kth model for the cth coordinate. z
k
(n)
is the input vector windowed by the kth NMF basis. Its ith
element is given by
z
k,i
(n) = x
i
(n) w
k,i
. (8)
Here, x
i
(n) is the normalized ring count of the neuron i at
time instance n, and w
k,i
is the ith element of the kth NMF
basis. g
k,c
and b
k,c
can be estimated based on the MSE crite-
rion by using of the stochastic gradient algorithm such as the
normalized least mean square (NLMS). The weight update
rule of the NLMS for each model is then given by
g
k,c
(n + 1) = g
k,c
(n) +

+
z
k
(n)
2
h
k
(n)e
c
(n)z
k
(n),
b
k,c
(n + 1) = b
k,c
(n) +

+
z
k
(n)
2
h
k
(n)e
c
(n),
(9)
where is the learning rate and is the normalization factor.
e
c
(n) is the error between the cth coordinate of the desired
response and the model output.
In the experiment, we divided the data samples into 1771
training samples and 372 test samples for monkey-1 dataset
and 1739 and 782, respectively, for monkey-2 dataset. The
parameters are set as {, , K} = {0.01, 1, 5}. The entire
training data set is presented 60 times sucient enough for
the weights to converge. The performance of the model is
evaluated on the test set by two measures; the correlation
coecient (CC) between desired hand trajectory and the
model output trajectory, and the mean squared error (MSE)
normalized by the variance of the desired response. Table 3
presents the evaluation of the performance of two systems
for both monkey-1 and monkey-2 datasets. It shows a signif-
icant improvement in generalization performance with the
mixture of models based on NMF factorization.
Note that the general performance of models for the
monkey-2 dataset is worse than that for the monkey-1
dataset. The reasons may come from many experimental
variables. One of them may be the number of electrodes and
the corresponding cortical areas, as we can see in Table 1 that
only 32 electrodes were implanted in two areas for monkey-2,
while 64 electrodes in four areas for monkey-1.
To quantify the performance dierence between the
Wiener lter and the mixture of multiple models, we can ap-
ply a statistical test based on the mean squared error (MSE)
performance metric [17]. By modeling the performance dif-
ference in terms of the MSE using short-time windows as a
normal random variable, one can apply the t-test to quan-
tify signicance. This t-test was applied to both modeling
outputs for monkey-1 and monkey-2 with = 0.01 or
= 0.05. For both datasets, the null hypothesis was re-
jected with both signicance levels, resulting in the p-values
of 0.0023 for monkey-1 and 0.0007 for monkey-2, respec-
tively. Therefore, the statistical test of the performance dier-
ence demonstrates that the mixture of multiple models based
on NMF improves the performance signicantly compared
to the standard Wiener lter.
3.5. Discussions
The results presented in the previous case study are a repre-
sentative example of a broader set of NMF experiments per-
formed on this recording. Selection of the number of taps
and the number of bases (r) is dependent on the particu-
lar stimulus or behavior associated with the neural data. Al-
though we have used a model selection method originally de-
veloped for clustering, and did not provide full justication
that this index is suitable to NMF, the main motivation is
to demonstrate that the problem of selecting the number of
bases can be addressed in the context of model selection. This
will be pursued in future research.
The number of patterns that can be distinctly represented
by NMF is limited by the number of bases. Avery small num-
ber of bases will lead to the combination of multiple patterns
into a single nonsparse basis vector. At the other extreme, a
very large number of bases will result in the splitting of a pat-
tern into two or more bases, which have similar encoding co-
ecient signals in time. In these situations, the bases under
consideration can be combined into one basis.
It is intriguing that the mixture of models based on NMF
generalizes better than the Wiener lter despite the fact that
the mixture contains much more model parameters. How-
ever, each model in the mixture receives the inputs processed
by the sparse basis vector. Therefore, each model learns the
mapping between only a particular subset of neurons and
hand trajectories, and the eective number of parameters for
each model is much less than the total number of input vari-
ables. Moreover, further overtting is avoided by combining
the outputs of local models by the sparse encodings of NMF.
4. CONCLUSIONS
Nonnegative matrix factorization is a novel and relatively
new tool for analyzing the data structure when nonnegativ-
ity constraints are imposed. In BMIs the neural inputs are
processed by grouping the rings into bin counts. Since the
bin counts are always positive, we hypothesized that NMF
would be appropriate for analyzing the neural activity. The
experimental results and the analysis presented in this paper
showed that we could nd repeated patterns in neuronal ac-
tivity that occurred in synchrony with the reaching behavior
and was automatically and eciently represented in a set of
sparse bases. The sparseness of the bases indicates that only
a small number of neurons exhibit repeated ring patterns
that are inuential in reconstructing the original neural ac-
tivity matrix.
As presented in [10], NMF provides local bases of the
objects, while principal component analysis (PCA) provides
global bases. In our preliminary experiments of PCA for the
same data, we have observed that PCA only found the most
frequently ring neurons, which may not be related to the be-
havior. Therefore, NMF can nd local representation of the
neural ring data, and this property of NMF can be more ef-
fective than PCA for BMIs where ring activities of dierent
cortical areas are collected.
Lee and Seung have claimed in their paper that the sta-
tistical independence among the encodings of independent
component analysis (ICA) forces the basis to be holistic [10].
And, if local parts of the neural activity occur together at the
same time, the complicated dependencies between the en-
codings would not be captured by the ICA algorithm. How-
ever, we have observed that the NMF encodings seem to be
uncorrelated over the entire movement. Hence, ICA with
some nonnegative constraints (e. g., nonnegative ICA [18],
the ICA model with nonnegative basis [19], and nonnegative
sparse coding [20]) may yield interesting encodings of the
neural ring activities. Further studies will present the com-
parison between NMF and these constrained ICA algorithms
applied for BMIs.
While NMF is found to be a useful tool for analyzing
neural data to nd repeatable activity patterns, there are
still several issues when using NMF for neural data analysis.
Firstly, the method only detects patterns of activity, but it is
known that the inactivity of a neuron could often indicate re-
sponse to a stimulus or cause a behavior. An analysis based on
NMF will fail to identify such neurons. Next, the nontation-
ary characteristics of neural activities would make it dicult
for NMF to nd xed spatiotemporal ring patterns. Since
the neural ensemble function tends to change over neuronal
space and time such that dierent spatio-temporal ring pat-
terns may be involved for the same behavioral output, we
may have to continuously adapt NMF factors to track those
changes. This motivates us to consider a recursive algorithm
of NMF, which will enable us to adapt NMF factors online. It
will be covered in the future study.
In our application of NMF, we demonstrated that the
NMF learning algorithm resulted in similar Frobenious
norm of the error matrix for 100 runs obtained with dif-
ferent initial conditions. However, this does not necessarily
mean that the resulted factors are similar with small variance.
Therefore, we need to quantify the similarity of the NMF re-
sults with dierent initializations. An alternative is to employ
other methods to obtain the global solution such as genetic
or simulated annealing algorithms. This will be presented in
a follow-up report.
ACKNOWLEDGMENTS
The authors would like to thank Johan Wessberg for collect-
ing the data used in this paper. This work was supported by
the DARPA project no. N66001-02-C-8022.
REFERENCES
[1] J. Wessberg, C. R. Stambaugh, J. D. Kralik, et al., Real-time
prediction of hand trajectory by ensembles of cortical neurons
in primates, Nature, vol. 408, no. 6810, pp. 361365, 2000.
[2] D. W. Moran and A. B. Schwartz, Motor cortical activity
during drawing movements: population representation dur-
ing spiral tracing, Journal of Neurophysiology, vol. 82, no. 5,
pp. 26932704, 1999.
[3] J. K. Chapin, K. A. Moxon, R. S. Markowitz, and M. A. L.
Nicolelis, Real-time control of a robot arm using simultane-
ously recorded neurons in the motor cortex, Nature Neuro-
science, vol. 2, no. 7, pp. 664670, 1999.
[4] M. D. Serruya, N. G. Hatsopoulos, L. Paninski, M. R. Fellows,
and J. P. Donoghue, Brain-machine interface: instant neural
control of a movement signal, Nature, vol. 416, no. 6877, pp.
141142, 2002.
[5] J. C. Sanchez, S.-P. Kim, D. Erdogmus, et al., Input-output
mapping performance of linear and nonlinear models for es-
timating hand trajectories from cortical neuronal ring pat-
terns, in Proc. 12th IEEE International Workshop on Neu-
ral Networks for Signal Processing, pp. 139148, Martigny,
Switzerland, September 2002.
[6] S. Darmanjian, S.-P. Kim, M. C. Nechyba, et al., Bimodal
brain-machine interface for motor control of robotic pros-
thetic, in Proc. IEEE/RSJ International Conference on Intelli-
gent Robots and Systems (IROS 03), vol. 4, pp. 36123617, Las
Vegas, Nev, USA, October 2003.
[7] J. C. Sanchez, D. Erdogmus, Y. N. Rao, et al., Interpret-
ing neural activity through linear and nonlinear models for
brain machine interfaces, in Proc. 25th Annual International
Conference of the IEEE Engineering in Medicine and Biology
Society, vol. 3, pp. 21602163, Cancun, Mexico, September
2003.
[8] J. M. Carmena, M. A. Lebedev, R. E. Crist, et al., Learning to
control a brain-machine interface for reaching and grasping
by primates, PLoS Biology, vol. 1, no. 2, pp. 116, 2003.
[9] D. D. Lee and H. S. Seung, Algorithms for non-negative ma-
trix factorization, in Advances in Neural Information Process-
ing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds.,
pp. 556562, MIT Press, Cambridge, Mass, USA, 2001.
[10] D. D. Lee and H. S. Seung, Learning the parts of objects by
non-negative matrix factorization, Nature, vol. 401, no. 6755,
pp. 788791, 1999.
[11] D. Guillamet, M. Bressan, and J. Vitri ` a, A weighted non-
negative matrix factorization for local representations, in
Proc. IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR 01), vol. 1, pp. 942947,
Kauai, Hawaii, USA, December 2001.
[12] A. dAvella and M. C. Tresch, Modularity in the motor sys-
tem: decomposition of muscle patterns as combinations of
time-varying synergies, in Advances in Neural Information
Processing Systems 14, T. G. Dietterich, S. Becker, and Z.
Ghahramani, Eds., pp. 629632, MITPress, Cambridge, Mass,
USA, 2002.
[13] J. C. Sanchez, From cortical neural spike trains to behav-
ior: modeling and analysis, Ph.D. dissertation, Department of
Biomedical Engineering, University of Florida, Gainesville,
Fla, USA, 2004.
[14] M. A. L. Nicolelis, A. A. Ghazanfar, B. M. Faggin, S. Votaw,
and L. M. Oliveira, Reconstructing the engram: simulta-
neous, multisite, many single neuron recordings, Neuron,
vol. 18, no. 4, pp. 529537, 1997.
[15] U. Maulik and S. Bandyopadhyay, Performance evaluation of
some clustering algorithms and validity indices, IEEE Trans.
Pattern Anal. Machine Intell., vol. 24, no. 12, pp. 16501654,
2002.
[16] D. Donoho and V. Stodden, When does non-negative matrix
factorization give a correct decomposition into parts? in Ad-
vances in Neural Information Processing Systems 16, S. Thrun,
L. K. Saul, and B. Sch olkopf, Eds., pp. 11411148, MIT Press,
Cambridge, Mass, USA, 2004.
[17] S.-P. Kim, J. C. Sanchez, Y. N. Rao, et al., A Compar-
ison of optimal MIMO linear and nonlinear models for
brain-machine interfaces, submitted to Neural Computation,
2004.
[18] M. Plumbley, Conditions for nonnegative independent com-
ponent analysis, IEEE Signal Processing Lett., vol. 9, no. 6, pp.
177180, 2002.
[19] L. Parra, C. Spence, P. Sajda, A. Ziehe, and K.-R. M uller, Un-
mixing hyperspectral data, in Advances in Neural Informa-
tion Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R.
M uller, Eds., pp. 942948, MIT Press, Cambridge, Mass, USA,
2000.
[20] P. O. Hoyer, Non-negative sparse coding, in Proc. 12th
IEEE International Workshop on Neural Networks for Signal
Processing, pp. 557565, Martigny, Switzerland, September
2002.
Sung-Phil Kim was born in Seoul, South
Korea. He received a B.S. degree from the
Department of Nuclear Engineering, Seoul
National University, Seoul, South Korea, in
1994. In 1998, he entered the Department
of Electrical and Computer Engineering,
University of Florida, in pursuit of Mas-
ter of Science degree. He joined the Com-
putational NeuroEngineering Laboratory as
a Research Assistant in 2000. He also re-
ceived an M.S. degree in December 2000 from the Department
of Electrical and Computer Engineering, University of Florida.
From 2001, he continued to pursue a Ph.D. degree in the De-
partment of Electrical and Computer Engineering, University of
Florida under supervision of Dr. Jose C. Principe. In the Compu-
tational NeuroEngineering Laboratory, he has investigated the de-
coding models and the analytical methods for brain-machine in-
terfaces.
Yadunandana N. Rao was born in Mysore,
India. He received his B.E. degree in elec-
tronics and communication engineering
from the University of Mysore, India, in Au-
gust 1997, and the M.S. and Ph.D. degrees
in electrical and computer engineering from
the University of Florida, Gainesville, Fla,
in 2000 and 2004, respectively. From May
2000 to January 2001, he worked as a De-
sign Engineer at GE Medical Systems, Wis.
Currently he is a Senior Engineer at Motorola, Fla. His research
interests include adaptive signal processing theory, algorithms and
analysis, neural networks for signal processing, and biomedical ap-
plications.
Deniz Erdogmus received his B.S. degrees
in electrical engineering and mathematics
in 1997, and his M.S. degree in electrical
engineering, with emphasis on systems and
control, in 1999, all from the Middle East
Technical University, Turkey. He received
his Ph.D. degree in electrical engineering
from the University of Florida, Gainesville,
in 2002. Since 1999, he has been with the
Computational NeuroEngineering Labora-
tory, University of Florida, working with Jose Principe. His current
research interests include information-theoretic aspects of adaptive
signal processing and machine learning, as well as their applications
to problems in communications, biomedical signal processing, and
controls. He is the recipient of the IEEE SPS 2003 Young Author
Award, and is a Member of IEEE, Tau Beta Pi, and Eta Kappa Nu.
Justin C. Sanchez received a B.S. degree
with highest honors in engineering sci-
ence along with a minor in biomechan-
ics from the University of Florida in 2000.
From 1998 to 2000, he spent three years
as a Research Assistant in the Department
of Anesthesiology, University of Florida. In
2000, he joined the Department of Biomed-
ical Engineering and Computational Neu-
roEngineering Laboratory, the University of
Florida. In the spring of 2004, he completed both his M.E. and
Ph.D. degrees in biomedical signal processing working on the de-
velopment of modeling and analysis tools for brain-machine in-
terfaces. He is currently a Research Assistant Professor in the De-
partment of Pediatrics, Division of Neurology, the University of
Florida. His neural engineering electrophysiology laboratory is cur-
rently developing neuroprosthetics for use in the research and clin-
ical settings.
Miguel A. L. Nicolelis was born in Sao
Paulo, Brazil, in 1961. He received his M.D.
and Ph.D. degrees from the University of
Sao Paulo, Brazil, in 1984 and 1988, re-
spectively. After postdoctoral work at Hah-
nemann University, Philadelphia, he joined
Duke University, where he now codirects
the Center for Neuroengineering and is a
Professor of neurobiology, biomedical en-
gineering, and psychological and brain sci-
ences. His laboratory is interested in understanding the general
computational principles underlying the dynamic interactions be-
tween populations of cortical and subcortical neurons involved in
motor control and tactile perception.
Jose C. Principe is a Distinguished Profes-
sor of electrical and computer engineering
and biomedical engineering at the Univer-
sity of Florida where he teaches advanced
signal processing, machine learning, and ar-
ticial neural networks (ANNs) modeling.
He is a BellSouth Professor and the Founder
and Director of the University of Florida
Computational NeuroEngineering Labora-
tory (CNEL). His primary area of interest
is processing of time-varying signals with adaptive neural models.
The CNEL has been studying signal and pattern recognition prin-
ciples based on information-theoretic criteria (entropy and mutual
information). He is an IEEE Fellow. He is a Member of the AD-
COM of the IEEE Signal Processing Society, a Member of the Board
of Governors of the International Neural Network Society, and the
Editor-in-Chief of the IEEE Transactions on Biomedical Engineer-
ing. He is a Member of the Advisory Board of the University of
Florida Brain Institute. He has more than 90 publications in ref-
ereed journals, 10 book chapters, and 200 conference papers. He
directed 35 Ph.D. dissertations and 45 Master theses. He recently
wrote an interactive electronic book entitled Neural and Adaptive
Systems: Fundamentals Through Simulation published by John Wi-
ley and Sons.
Finding Signicant Correlates of Conscious
Activity in Rhythmic EEG
Piotr J. Durka
Laboratory of Medical Physics, Institute of Experimental Physics, Warsaw University, ul. Hoza 69, 00-681 Warsaw, Poland
Email: durka@fuw.edu.pl
Received 28 January 2004; Revised 27 July 2004
One of the important issues in designing an EEG-based brain-computer interface is an exact delineation of the rhythms, related
to the intended or performed action. Traditionally, related bands were found by trial and error procedures seeking maximum
reactivity. Even then, large values of ERD/ERS did not imply the statistical signicance of the results. This paper presents complete
methodology, allowing for a high-resolution presentation of the whole time-frequency picture of event-related changes in the
energy density of signals, revealing the microstructure of rhythms, and determination of the time-frequency regions of energy
changes, which are related to the intentions in a statistically signicant way.
Keywords and phrases: time-frequency, adaptive approximations, matching pursuit, ERD, ERS, multiple comparisons.
1. INTRODUCTION
Thinking of a brain-computer interface (BCI), one can
imagine a device which would directly process all the brains
outputlike in a perfect virtual reality machine [1]. Todays
attempts are much more humble: we are basically at the level
of controlling simple left/right motions. On the other hand,
these approaches are more ambitious than direct connec-
tions to the peripheral nerves: we are trying to guess the in-
tention of an action directly from the activity of the brains
cortex, recorded from the scalp (EEG).
Contemporary EEG-based BCI systems are based upon
various phenomena like, for example, visual or P300 evoked
potentials, slow cortical potentials, or sensorimotor cortex
rhythms [2]. The most attractive path leads towards the de-
tection of the natural EEG features, for example a normal
intention of moving the right hand (or rather its reection
in EEG) would move the cursor to the right. Determination
of such features in EEG is more dicult than using evoked
or especially trained responses. Desynchronization of the
rhythm is an example of a feature correlated not only with
the actual movement, but also with its mere imagination.
All these approaches encounter obstacles, common in the
neurosciences: great intersubject variability and poor under-
standing of the underlying processes. Signicant improve-
ment can be brought by coherent basic research on the EEG
representation of conscious actions. This paper presents two
methodological aspects of such research.
(i) High-resolution parameterization and feature extrac-
tion from the EEG time series. Scalp electrodes gather
signal from many neural populations, so the rhythms
of interest are buried in a strong background. Owing
to the high temporal resolution of EEG and the oscil-
latory character of most of its features, we can look for
the relevant activities in the time-frequency plane.
(ii) Determination of signicant correlates of conscious
activities requires a dedicated statistical framework.
Until recently, reporting signicance of changes in the
time-frequency plane presented a serious problem.
2. TIME-FREQUENCY ENERGY DENSITY OF SIGNALS
Among the parameters used in nowadays BCI systems
(like those designed in the Graz University of Technology
[3]), event-related desynchronization and synchronization
(ERD/ERS) phenomena play an important role. ERD and
ERS are dened as the percentage of change of the average
(across repetitions) power of a given rhythmusually /,
, and [4]. Estimation of the time course of the rhythm en-
ergy is crucial for the sensitivity of these parameters. But due
to the intersubject variability, we cannot expect the rhythms
to appear at the same frequencies for all subjects.
Therefore, a classical procedure was developed to nd the
reactive rhythms [4]. For each subject, the frequency range of
interest was divided into 1 Hz intervals, in each of them the
single trials (repetitions) were bandpass ltered, squared, and
averaged, to obtain the estimate of the average band energy.
Among these xed bands, those revealing the largest changes
related to the event were chosen. This naturally limits the
Signicance of Changes of the Time-Frequency Energy Density of EEG 3123
a
2
2ab
b
2
a b
Figure 1: Top: Wigner distribution ((A.5); verticalfrequency,
horizontaltime) of the signal simulated as two short sines (bot-
tom). We observe the autoterms a
2
and b
2
corresponding to the
time and frequency spans of the sines, and cross-term 2ab at time
coordinates where no activity occurs in the signal.
frequency resolution to 1 Hznot taking into account the
accuracy of bandpass ltering of nite sequences.
The whole problem is naturally embedded in the time-
frequency space. Time-frequency density of signal energy, av-
eraged across trials, provides all the information about the
rhythms and the time course of their energy in one clear pic-
ture (Figure 2).
2.1. Time-frequency distributions of energy density
Because of the uncertainty principle, there are many alterna-
tive estimates of the time-frequency density of signals en-
ergy. Actually, the same problem (nonunique estimates) is
present also in calculating the spectral power or bandpass l-
tering nite sequences, but in the quadratic time-frequency
distributions we may say that the relevancy of the prob-
lem is squared. Fluctuations of power spectra, appearing
at high resolutions, in the time-frequency distributions take
the form of cross-terms. These false peaks occur in between
the autoterms (which correspond to the actual signals struc-
tures), and signicantly blur the energy estimates (Figure 1).
Their presence stems from the equation (a + b)
2
= a
2
+ b
2
+
2ab. Quadratic representation of an unknown signal s, com-
posed of two structures a and b, contains autoterms corre-
sponding to these structures (a
2
and b
2
) as well as the cross-
term 2ab. For a signal more complex than a sum of two
clear and separate structures (like the simplistic simulation
in Figure 1), cross-terms are indistinguishable from the au-
toterms. Advanced mathematical methods are being devel-
oped for the reduction of this drawback [5]. While some of
them give impressive results for particular signals, in general
we are confronted with the tradeo: higher resolution versus.
more reliable (suppressed cross-terms) estimate.
2.2. Adaptive approximations
If we knew exactly the structures (a and b) of which the sig-
nal is composed, we might explicitly omit the cross-term2ab,
thus obtaining a clear time-frequency picture. In practice,
this would require a reasonably sparse approximation of the
signal in a form
s
M
_
n=1
w
i
g
i
, (1)
where g
i
are known functions tting well the actual signals
structures. This may be achieved only by choosing the func-
tions g
i
for each analyzed signal separately.
1
Criterion of their
choice is usually aimed at explaining the maximum part of
signal energy in a given number of iterations (M). However,
the problem of choosing the optimal set of functions g
i
is in-
tractable.
2
A suboptimal solution can be found by means of
the matching pursuit (MP) algorithm [7]. But even this sub-
optimal solution is still quite computer-intensive,
3
so the rst
practical applications were not possible before mid-nineties
[8]. The MP algorithm and construction of an estimate of
the signals time-frequency energy density, which is free of
cross-terms, are described in the appendix. Functions g
i
are
chosen from large and redundant collections of Gabor func-
tions (sine-modulated Gauss).
Advantages of this estimator in the context of event-
related desynchronization and synchronization were dis-
cussed in [9, 10].
3. MICROSTRUCTURE OF THE EEGRHYTHMS
3.1. Experimental data
To present advantages of the presented methodology, the
classical ERD/ERS experimental setup was modied to ob-
tain relatively long epochs of EEG between the events.
Thirty-one-year-old right-handed subject was half lying
in a dim room with open eyes. Movement of the thumb, de-
tected by a microswitch, was performed approximately 5 sec-
onds (at a subjects choice) after a quiet sound generated ap-
proximately every 20 seconds. Experiment was divided into
15-minute sessions, and recorded EEG into 20-second long.
After artifacts rejection, 124 epochs were left for the analysis.
EEGwas registered fromelectrodes at positions selected from
the 1020 system. Figures 24 present results for the C4 elec-
trode (contralateral to the hand performing movements) in
the local average reference. Signal was down-sampled oine
from 250 Hz to 125 Hz.
Figure 5 presents data from another subject, collected in
a standard ERD/ERS experiment.
1
Contrary to most of the approaches, where all the signals are repre-
sented via products with the same set of functions (e.g., basis).
2
Finding the subset of M functions, which explains the largest ratio of
the signal energy among all the other M-subsets of the highly redundant set,
requires checking all the possible M-subsets, which leads to the combina-
torial explosion even for moderate sets of candidate functions. Problems of
such computational complexity are termed NP-hard [6].
3
Recent results indicate possibilities of a signicant decrease of compu-
tation times of bias-free MP decompositions.
40
32
24
16
8
0
0 2 4 6 8 10 12 14 16 18 20
Figure 2: Average time-frequency energy density of 124 trials (Section 3.1, energy cut above 2%, sqrt scale); darker area marks higher values
of the energy density. Horizontal scale in seconds, vertical in Hz. Finger movement in the 12th second.
40
32
24
16
8
0
3 5 7 9 11 13 15 17
209
104
0
34
68
%
Figure 3: ERD/ERS map corresponding to the time between 3 and 19 seconds (vertical lines in Figure 2). Shades of gray are proportional to
the percentage of change relative to the reference epoch (between 1 and 3 seconds in Figure 2).
3.2. High-resolution picture of energy density
Time-frequency estimates of the signal energy density, in-
cluding the MP estimate given by (A.5), contain no phase in-
formation, so they can be summed across the trials to give the
average time-frequency density of energy.
4
Figure 2 presents
such an average for 124 repetitions of EEG synchronized to
the nger movement, occurring in the 12th second. We eas-
ily observe that the rhythm concentrates around 12 Hz. We
may also notice its decrease (desynchronization) around the
time when nger movement occurred, as well as some in-
creased activity in 1530 Hz near 1213 seconds ( synchro-
nization).
In another experiment (Figure 5), high-resolution esti-
mate revealed clearly two very close but separate components
of the rhythmwith dierent time coursesan eect elusive
to the previously applied methods.
3.3. High-resolution ERDand ERS
Speaking of the decrease in the rhythm in the previ-
ous section, we compared the activity near the 12th sec-
ond (Figure 2) to the average level of the rhythm energy,
or, more correctly, to a period before the movement, which
should not be related to the event. To quantify this proce-
dure, we must dene the reference period, to which the en-
ergy changes will be related. It should be distant enough from
the onset of the event, to avoid incorporating premovement
correlates into the reference. To avoid border problems of es-
timates, it should be also removed from the very start of the
4
Note that the average of the energy densities is in general dierent from
the energy density of the averaged signal. The latter (averaged signal) reveals
phase-locked phenomena like for example the classical evoked potential.
analyzed epoch. In Figure 2 it was chosen between the 1st and
the 3rd second.
Classically, for each selected band, ERD/ERS were cal-
culated as the percentage of power relative to the reference
epoch (ERD corresponding to a decrease and ERS to an in-
crease). Owing to the high-resolution estimate of the whole
picture of energy density, we may calculate it for the whole
relevant time-frequency region with maximum resolution.
ERD/ERS map in Figure 3 was obtained as a ratio of each
points energy to the average energy of the reference epoch in
the same frequency. In this plot we observe, like in Figure 2,
darker area (increase) corresponding to the postmove-
ment synchronization, and white spot around the time of
the movement, corresponding to the desynchronization.
However, in the long premovement period there are still a
lot of uctuations, which naturally implies a question about
the statistical signicance of the observed changes.
4. STATISTICAL SIGNIFICANCE
The following steps constitute a fully automatic (hence ob-
jective and repeatable) and statistically correct procedure,
which delineates and presents with high resolution the time-
frequency regions of signicant changes in the average energy
density.
(1) Divide the time-frequency plane into resels (from
resolution elements), for which the statistics are calculated
(Section 4.1).
(2) Calculate pseudo-t statistics and p-values for the null
hypothesis of no change in the given resel compared to the
reference epoch in the same frequency (Section 4).
(3) Select a threshold for the null hypothesis corrected by
multiple comparisons (Section 4.3).
(4) Display the energy changes calculated for maximum
40
32
24
16
8
0
3 5 7 9 11 13 15 17
209
104
0
34
68
%
Figure 4: ERD/ERS from Figure 3 displayed in regions revealing statistically signicant changes in resampling pseudo-t tests (Section 4.2),
corrected by 5% false detection rate (Section 4.3).
13
11
9
7
5
Hz
0.5
1.5
2.5
3.5
4.5
5.5
6.5
Movement s
Figure 5: Average time-frequency energy density (2) of 57 trials
fromthe C1 electrode (average reference), constructed for g
i
longer
than 250 milliseconds. Presented from5 to 15 Hz is the nger move-
ment in the 5th second. We observe two very close, but separate,
rhythms with dierent time courses. Faster rhythm desynchronizes
about 1.5 seconds before the movement, while the slower lasts until
its very onset and desynchronizes in the 5th second.
resolution (Section 3.2) in windows corresponding to resels
which indicated statistically signicant changes.
These steps will be described in the following sections.
Further details can be found in [10].
4.1. Integration of MP maps in resels
In choosing the dimensions of a resel, suitable for the sta-
tistical analyses, we turn to the theory of the periodogram
sampling [11]. For a statistically optimal sampling of the pe-
riodogram the product of the frequency interval and signal
length gives 1/2. This value was taken as the product of the
resels widths in time and frequency, their ratio being a free
parameter.
Calculating the amount of energy in such relatively large
resels simply as the value of the distribution (A.5) in its cen-
ter, that is,
E
point
_
t
i
,
i
_
=
_
n
_
R
n
f , g
n
_
2
Wg
n
_
t
i
,
i
_
, (2)
may not be representative for the amount of energy con-
tained in a given resel. In such case
5
we use the exact solution:
E
int
_
t
i
,
i
_
=
_
n
_
R
n
f , g
n
_
2
_
ti +t/2
ti t/2
_
i +/2
i /2
Wg
n
(t, )dt d.
(3)
4.2. Resampling the pseudo-t statistics
The values of energy of all the N repetitions (trials) in each
questioned resel will be compared to the energies of resels
within the corresponding frequency of the reference epoch.
We denote the time indices t
i
of resels belonging to the ref-
erence epoch as {t
i
, i ref} and their number contained in
each frequency slice as N
ref
. For each resel at coordinates
{t
i
,
i
} we will compare its energy averaged over N repeti-
tions with the energy averaged over repetitions in resels from
the reference epoch in the same frequency. Their dierence
can be written as
E
_
t
i
,
i
_
=
1
N
N
_
k=1
E
k
int
_
t
i
,
i
_
+
1
N N
ref
N
_
k=1
_
jref
E
k
int
_
t
j
,
i
_
= E
_
t
i
,
i
_
E
_
t
ref
,
i
_
,
(4)
where the superscript k denotes the kth repetition (out of
N).
However, we want to account also for the dierent vari-
ances of E
k
, revealing the variability of the N repetitions.
Therefore we replace the simple dierence of means (4) by
the pseudo-t statistics:
t =
E
_
t
i
,
i
_
s
, (5)
where E is dened as in (4), and s
is the pooled variance of

the reference epoch and the investigated resel. In spite of the
5
The dierence between (2) and (3) is most signicant for structures
narrow in time or frequency relative to the dimensions of resels.
central limit theorem, this magnitude tends to have nonnor-
mal distribution [10]. Therefore, we use resampling meth-
ods.
We estimate the distribution of t from (5)under the
null hypothesis of no signicant changefrom the data in
the reference epoch (for each frequency N N
ref
values) by
drawing with replacement two samples of sizes N and N
N
ref
and calculating, for each such replication, statistics (5).
This distribution is approximated once for each frequency.
Then for each resel the actual value of (5) is compared to
this distribution yielding p for the null hypothesis.
The number of permutations giving values of (5) exceed-
ing the observed value has a binomial distribution for N
rep
repetitions with probability .
6
Its variance equals N
rep
(1
). The relative error of will be then (cf. [12])
_
(1 )
N
rep
. (6)
To keep this relative error at 10% for a signicance level
= 5%, N
rep
= 2000 is enough. Unfortunately, due to the
problem of multiple comparisons discussed in Section 4.3,
we need to work with much smaller values of . In this study
N
rep
was set to 2 10
6
, which resulted in relatively large com-
putation times.
4.3. Adjustment for multiplicity
In the preceding section, we estimated the achieved signif-
icance levels p for null hypotheses of no change of the av-
erage energy in each resel, compared to the reference region
in the same frequency. Adjusting results for multiplicity is a
very important issue in case of such a large amount of po-
tentially correlated tests. As proposed in [10], it can be eec-
tively achieved using the false discovery rate (FDR, [13]). It
controls the ratio q of the number of the true null hypothe-
ses rejected to all the rejected hypotheses. In our case this is
the ratio of the number of resels, to which signicant changes
may be wrongly attributed, to the total number of resels re-
vealing changes.
Let us denote the total number of performed tests, equal
to the number of questioned resels, as m. If for m
0
of them
the null hypothesis of no change is true, then [13] proves
that the following procedure controls the FDR at the level
q(m
0
/m) q.
(1) Order the achieved signicance levels p
i
, approxi-
mated in the previous section for all the resels sepa-
rately, in an ascending series: p
1
p
2
p
m
.
(2) Find
k = max
_
i : p
i

i
m
m
j=1
(1/ j)
q
_
. (7)
(3) Reject all hypotheses for which p p
k
.
6
For brevity we omit the distinction between the exact value , which
would be estimated from all the possible repetitions, and the actually calcu-
lated.
4.4. Display of the statistically signicant ERD/ERS
Figure 4 gives the nal picture of statistically signicant
changes in the time-frequency plane. It is constructed by dis-
playing the high-resolution ERD/ERS map (Figure 3) only in
the areas corresponding to the resels which revealed statisti-
cal signicance in the procedure from Section 4. Desynchro-
nization of 12-Hz occurs around the time of the movement
(12th second). Synchronization of 1830 Hz , occurring just
after the movement, is divided in half by the harmonic of
(24 Hz). In the long premovement epoch no signicant
changes are detected, which suggests the robustness and reli-
ability of the whole procedure.
5. CONCLUSIONS
Presented procedure gives high-resolution and free-of-cross-
terms estimates of the average time-frequency energy den-
sity of event-related EEG, revealing the microstructure of
rhythms. Time-frequency area of signicant changes are as-
sessed via objective statistical procedures. This allows for ex-
ample to investigate the minimum number of repetitions re-
quired to delineate the reactive rhythms. Application of this
methodology may bring a signicant improvement in basic
research on the event-related changes of EEG rhythms, as
well as per subject customization of the ERD/ERS-based
BCI.
6. REPRODUCIBLE RESEARCH
Software for calculating the MP decomposition (ap-
pendix), with complete source code in C and executa-
bles for GNU/Linux and MS Windows, plus an inter-
active display and averaging of the time-frequency maps
of energy (in Java), are available at http://brain.fuw.edu.
pl/durka/software/mp. Datasets used in Figures 54 and
Matlab code for calculating maps and statistics like Figures
24: are available at http://brain.fuw.edu.pl/durka/tfstat/.
APPENDIX
MATCHINGPURSUIT ALGORITHM
In each of the steps, a waveform g
n
from the redundant dic-
tionary D is matched to the signal R
n
f , which is the residual
left after subtracting results of previous iterations:
R
0
f = f ,
R
n
f =
_
R
n
f , g
n
_
g
n
+ R
n+1
f ,
g
n
= arg max
g
i
D
_
R
n
f , g
i
_
,
(A.1)
where arg max
g
i
D
means the g
i
giving the largest value of
the product |R
n
f , g
i
|.
Dictionaries (D) for time-frequency analysis of real sig-
nals are constructed from real Gabor functions:
g
(t) = K()e
((tu)/s)
2
sin
_
2

N
(t u) +
_
. (A.2)
N is the size of the signal, K() is such that g
= 1,
= {u, , s, } denotes parameters of the dictionarys func-
tions. For these parameters no particular sampling is a pri-
ori dened. In practical implementations we use subsets of
the innite space of possible dictionarys functions. However,
any xed scheme of subsampling this space introduces a sta-
tistical bias in the resulting parameterization. A bias-free so-
lution using stochastic dictionaries, where parameters of the
dictionarys functions are randomized before each decompo-
sition, was proposed in [14].
For a complete dictionary the procedure converges to
f in theory in an innite number of steps [7], but in prac-
tice we use nite sums:
f
M
_
n=0
_
R
n
f , g
n
_
g
n
. (A.3)
From this decomposition we can derive an estimate E f (t, )
of the time-frequency energy density of signal f , by choosing
only autoterms from the Wigner distribution
Wf (t, ) =
_
f
_
t +

2
_
f
_
t

2
_
e
i
d, (A.4)
calculated for the expansion (A.3). This representation will
be a priori free of cross-terms:
E f (t, ) =
M
_
n=0
_
R
n
f , g
n
_
2
Wg
n
(t, ). (A.5)
ACKNOWLEDGMENTS
Thanks to J.

Zygierewicz and J. Ginter for the example
datasets. This work was supported by the Grant 4T11E02823
of the Committee for Scientic Research (Poland).
REFERENCES
[1] S. Lem, Summa Technologiae, Wydawnictwo Literackie,
Krak ow, Poland, 2nd edition, 1966.
[2] J. R. Wolpaw, N. Birbaumer, W. J. Heetderks, et al., Brain-
computer interface technology: a review of the rst interna-
tional meeting, IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 164
173, 2000.
[3] G. Pfurtscheller, C. Neuper, C. Guger, et al., Current trends
in Graz brain-computer interface (BCI) research, IEEE Trans.
Rehab. Eng., vol. 8, no. 2, pp. 216219, 2000.
[4] G. Pfurtscheller, EEG event-related desynchronization
(ERD) and event-related electro-encephalogram synchro-
nization (ERS), in Electroencephalography: Basic Principles,
Clinical Applications and Related Fields, E. Niedermayer and
F. Lopes Da Silva, Eds., pp. 958965, Williams & Wilkins,
Baltimore, Md, USA, 4th edition, 1999.
[5] W. J. Williams, Recent advances in time-frequency represen-
tations: Some theoretical foundations, in Time Frequency and
Wavelets in Biomedical Signal Processing, M. Akay, Ed., IEEE
Press Series in Biomedical Engineering, pp. 343, IEEE Press,
Piscataway, NJ, USA, 1997.
[6] D. Harel, Algorithmics: The Spirit of Computing, Addison-
Wesley, Reading, Mass, USA, 2nd edition, 1992.
[7] S. G. Mallat and Z. Zhang, Matching pursuits with time-
frequency dictionaries, IEEE Trans. Signal Processing, vol. 41,
no. 12, pp. 33973415, 1993.
[8] P. J. Durka and K. J. Blinowska, Analysis of EEG transients by
means of matching pursuit, Annals of Biomedical Engineering,
vol. 23, no. 5, pp. 608611, 1995.
[9] P. J. Durka, D. Ircha, C. Neuper, and G. Pfurtscheller,
Time-frequency microstructure of event-related electro-
encephalogram desynchronization and synchronization,
Medical & Biological Engineering & Computing, vol. 39, no. 3,
pp. 315321, 2001.
[10] P. J. Durka, J.

Zygierewicz, H. Klekowicz, J. Ginter, and K.
J. Blinowska, On the statistical signicance of event-related
EEG desynchronization and synchronization in the time-
frequency plane, IEEE Trans. Biomed. Eng., vol. 51, no. 7,
pp. 11671175, 2004.
[11] M. B. Priestley, Spectral Analysis and Time Series, Academic
Press, New York, NY, USA, 1981.
[12] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap,
Chapman & Hall, New York, NY, USA, 1993.
[13] Y. Benjamini and D. Yekutieli, The control of the false discov-
ery rate in multiple testing under dependency, Ann. Statist.,
vol. 29, no. 4, pp. 11651188, 2001.
[14] P. J. Durka, D. Ircha, and K. J. Blinowska, Stochastic time-
frequency dictionaries for matching pursuit, IEEE Trans. Sig-
nal Processing, vol. 49, no. 3, pp. 507510, 2001.
Piotr J. Durka received his M.S. and Ph.D.
degrees in medical physics from Warsaw
University, where he is currently an Assis-
tant Professor. His research relates to the
methodology of EEG analysis, mainly time-
frequency signal processing. He introduced
adaptive approximations (MP algorithm) to
the EEGanalysis; after a decade of successful
applications, he aims at the unication of
advanced signal processing and traditional,
visual analysis of EEG.
c 2005 David A. Peterson et al.
Feature Selection and Blind Source Separation
in an EEG-Based Brain-Computer Interface
David A. Peterson
Department of Computer Science, Center for Biomedical Research in Music, Molecular,
Cellular, and Integrative Neurosciences Program, and Department of Psychology,
Colorado State University, Fort Collins, CO 80523, USA
Email: petersod@cs.colostate.edu
James N. Knight
Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA
Email: nate@cs.colostate.edu
Michael J. Kirby
Department of Mathematics, Colorado State University, Fort Collins, CO 80523, USA
Email: kirby@math.colostate.edu
Charles W. Anderson
Department of Computer Science and Molecular, Cellular, and Integrative Neurosciences Program,
Email: anderson@cs.colostate.edu
Michael H. Thaut
Center for Biomedical Research in Music and Molecular, Cellular, and Integrative Neurosciences Program,
Email: michael.thaut@colostate.edu
Received 1 February 2004; Revised 14 March 2005
Most EEG-based BCI systems make use of well-studied patterns of brain activity. However, those systems involve tasks that indi-
rectly map to simple binary commands such as yes or no or require many weeks of biofeedback training. We hypothesized
that signal processing and machine learning methods can be used to discriminate EEG in a direct yes/no BCI from a single
session. Blind source separation (BSS) and spectral transformations of the EEG produced a 180-dimensional feature space. We
used a modied genetic algorithm (GA) wrapped around a support vector machine (SVM) classier to search the space of feature
subsets. The GA-based search found feature subsets that outperform full feature sets and random feature subsets. Also, BSS trans-
formations of the EEG outperformed the original time series, particularly in conjunction with a subset search of both spaces. The
results suggest that BSS and feature selection can be used to improve the performance of even a direct, single-session BCI.
Keywords and phrases: electroencephalogram, brain-computer interface, feature selection, independent components analysis,
support vector machine, genetic algorithm.
1. INTRODUCTION
1.1. EEG-based brain-computer interfaces
There is a fast-growing research and development eort un-
derway to implement brain-computer interfaces (BCI) using
This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
the electroencephalogram (EEG) [52]. The overall goal is to
provide people with a new channel for communication with
the external environment. This is particularly important for
patients who are in a locked-in state in which conventional
motor output channels are compromised.
One simple, desirable BCI function would allow individ-
uals without motor function to respond to questions with
simple yes or no responses [35]. Yet most BCI research
has used experiments that require an indirect mapping be-
tween what the subject does and the eect on an external
Feature Selection and BSS in EEG-Based BCI 3129
system. For example, subjects may be required to imagine
left- or right-hand movement in order to use the BCI [3, 37,
39]. If they want to use the BCI to respond yes/no to ques-
tions, they have to remember that left-hand imagined move-
ment corresponds to yes, and right-hand imagined move-
ment corresponds to no. Other BCI research requires ex-
tensive subject biofeedback training in order for the subject
to gain some degree of voluntary inuence over EEG features
such as slow cortical potentials [5] or 812 Hz rhythms [53].
For both the imagined movement and biofeedback scenarios,
the mapping between what the subject does and the eect on
the BCI is indirect. In the latter case, a single session is insuf-
cient and the subject must undergo many weeks or months
of training sessions.
A more direct approach would simply have the sub-
ject imagine yes or no and would not require extensive
biofeedback training. While imagined movement and bidi-
rectional inuence over time- and frequency-domain ampli-
tude can be readily detected and used as control signals in
a BCI, the EEG activity associated with complex cognitive
tasks such as imagining dierent words is much more poorly
understood. Can advances in signal processing and pat-
tern recognition methods enable us to distinguish whether
a subject is imagining yes or no by the simultaneously
recorded EEG? Furthermore, can that distinction be learned
in a single recording session?
1.2. The EEGfeature space
The EEG measures the scalp-projected electrical activity of
the brain with millisecond resolution at up to over 200 elec-
trode locations. Although most EEG-based BCI research uses
far fewer electrodes, research into the role of the specic to-
pographic distribution of the electrodes [54] suggests that
dense electrode arrays may standardize and enhance the sys-
tems performance. Furthermore, advances in electrode and
cap technology have made the time required to apply over
200 electrodes reasonable even for BCI patients. EEG anal-
yses, including much of the EEG-based BCI research, make
extensive use of the signals corresponding frequency spec-
trum. The spectrumis usually divided into ve canonical fre-
quency bands. Thus, if one considers the power in each of
these bands for each of 200 electrodes, each trial is described
by 1000 features. If interelectrode features such as cross-
correlation or coherence are considered, this number grows
combinatorially. As in many such problems, a subset of fea-
tures will often lead to better dissociation between trial types
than the full set of features. However, the number of unique
feature subsets for N features is 2
N
, a space that cannot be ex-
haustively explored for N greater than about 25. This is but
one reason why most EEG research uses only a very small
number of features. A signicant number of features are dis-
carded, including features that might signicantly improve
the accuracy with which the signals can be classied.
1.3. Blind source separation of EEG
Given a set of observations, in our case a set of time series,
blind source separation (BSS) methods such as independent
component analysis (ICA) [22] attempt to nd a (usually
linear) transformation of the observations that results in a
set of independent observations. Infomax [4] is an imple-
mentation of ICA that searches for a transformation that
maximizes the information between the observations and the
transformed signals. Bell and Sejnowski showed that a trans-
formation maximizing the information is, in many cases, a
good approximation to the transformation resulting in in-
dependent signals. ICA has been used extensively in analyses
of brain imaging data, including EEG [26, 34], magnetoen-
cephalogram (MEG) [47, 49], and functional magnetic res-
onance imaging (FMRI) [26]. Assumptions about how inde-
pendent brain sources are mixed and map to the recorded
scalp electrodes, and the corresponding relevance for BSS
methods, are discussed extensively in [27].
Maximum noise fraction (MNF) is an alternative BSS ap-
proach for transforming the raw EEG data. It was initially
introduced in the context of denoising multispectral satellite
data [14]. Subsequently it has been extended to the denois-
ing of time-series [1] and it has been compared to principal
components analysis and canonical correlation analysis in a
BCI [2]. The basis of the MNF subspace approach is to con-
struct a set of basis vectors that optimize the amount of noise
(or, equivalently, signal) captured. Specically, the maximum
noise fraction basis maximizes the noise-to-signal (as well as
the signal-to-noise) ratio of the transformed signal. Thus, the
optimization criterion is based on the ratio of second-order
statistical quantities. Furthermore, unlike ICA, the basis vec-
tors have a natural ordering based on the signal-to-noise ra-
tio. MNF is similar to the second-order blind identication
(SOBI) algorithm and requires that the signals have dierent
autocovariance structures. The requirement exists because of
the second-order nature of the algorithm.
The relationship of MNF to ICA is a consequence of the
fact that they both provide methods for solving the BSS prob-
lem [1, 21]. Initial results for the application of MNF to the
analysis of EEG time-series demonstrated MNF was simulta-
neously eective at eliminating noise and extracting what ap-
peared to be observable phenomenon such as eye blinks and
line noise [28, 29]. It is interesting that ICA and MNF per-
form similarly given their disparate formulations. This sug-
gests that under appropriate assumptions (see [1, 21, 28]) the
mutual information criterion and the signal-to-noise ratio
can be related quantities. However, in the instance that sig-
nals of interest are mixed such that they share the same sub-
space, the MNF approach provides a representation for the
mixed and unmixed subspaces.
1.4. Classication and the feature selection problem
The support vector machine (SVM) classier [45, 48] learns
a hyperplane that provides a maximal soft margin between
the data classes in a higher-dimensional transform space de-
termined by a choice of kernel function. Although SVMs can
fail in problems with many nuisance features [19], they have
demonstrated competitive classication performance in dif-
cult domains as diverse as DNA microarray data [8], text
categorization [25], and image classication [40]. They have
also been successfully employed in EEG-based BCI research
[6, 12, 32, 56]. In contrast to competing nonlinear classiers
such as multilayer perceptrons, SVMs often exhibit higher
classication accuracy, are not susceptible to local optima,
and can be trained much faster. Because we seek feature sub-
sets that maximize classication accuracy, the feature subset
search needs to be driven by how well the data can be clas-
sied using the corresponding feature subsets, the so-called
wrapper approach to feature selection [30]. Thus the speed
characteristic of SVMs is particularly important because we
will train and test the classiers for every feature subset we
evaluate.
Our prior research with EEG datasets from a cognitive
BCI [2] and movement prediction BCI [12] demonstrated
the benet of feature selection for small and large feature
spaces, respectively. There are many ways to implement the
feature selection search [7, 16, 42]. One logical choice is a
genetic algorithm (GA) [13, 20]. GAs provide a stochastic
global search of the feature subset space, evaluating many
points in the space in parallel. Apopulation of feature subsets
is evolved using crossover and mutation operations akin to
natural selection. The evolution is guided by howwell feature
subsets can classify the trials. GAs have been successfully em-
ployed for feature selection in a wide variety of applications
[15, 51, 55] including EEG-based BCI research [12, 56]. GAs
often exhibit superior performance in domains with many
features [46], do not get trapped in local optima as with gra-
dient techniques, and make no assumptions about feature in-
teractions or the lack thereof.
In summary, this paper evaluates a feature selection sys-
tem for classifying trials in a novel, challenging BCI using
spectral features from the original, and two BSS transforma-
tions of, scalp recorded EEG. We hypothesized (1) that clas-
sication accuracy would be higher for the feature subsets
found by the GAthan for full feature sets and randomfeature
subsets and (2) that the power spectra of the BSS transforma-
tions would provide feature subsets with higher classication
accuracy than the power spectra of the original signals.
2. METHODS
2.1. Subjects
The subjects were 34 healthy, right-handed fully informed
consenting volunteers with no history of neurological or psy-
chiatric conditions. The present paper is based on data from
eight of the subjects who met certain criteria for behavioral
measures and details of the EEGrecording procedure. Specif-
ically, we selected eight subjects that wore caps with physi-
cally linked mastoids for the reference. Other subjects wore a
cap with mastoids digitally linked for the reference. Although
the dierence between physically and digitally linked mas-
toid reference is minor, it can be nontrivial depending on
the relative impedances at the two mastoid electrodes [36].
Thus, to eliminate the possibility that the slight dierence
in caps could inuence the questions at hand, we elected
to consider only those subjects wearing the cap with physi-
cally linked mastoids. We also considered only those subjects
0 0.75 1.5 2.5 3 s
EEG
<Visualize>
(100 trials)
No
<Visualize>
Yes
Visual display
Figure 1: BCI task timeline. Subjects were asked to visualize the
most recently presented word until the next word is displayed. The
period of simultaneously recorded EEG used for subsequent anal-
ysis was 1000 milliseconds long beginning 750 milliseconds after
display oset and 500 milliseconds before the next display onset.
that exhibited reasonable inter-response intervals and a rea-
sonably even distribution of yes/no responses in a sep-
arate, voluntarily decided premotor visualization version of
the task (described in a separate forthcoming manuscript).
The subjects were selected on these criteria only, before their
EEG data was reviewed. The eight subjects were 19 + / 1
years of age and included ve females.
2.2. BCI experiment procedure
On each of 100 trials subjects were shown one of the words
yes or no on a computer display for 750 milliseconds and
were instructed to visualize the word until the next word is
displayed (see Figure 1). There were 50 yes trials and 50
no trials presented in random order with a maximum of
three of the same stimulus in a row. Because in subsequent
analyses we planned to ignore the rst two trials due to ex-
periment start-up transients, the rst two trials were required
to include exactly one of each type.
2.3. EEGrecording and feature composition
The EEG was continuously recorded with a 32-electrode cap
(QuikCap, Neuroscan, Inc.), pass band of 1100 Hz, and
sampled at 1 kHz. Although much higher than the 200 Hz
required by Nyquist, we typically sample at 1 kHz for the
mere convenience that in subsequent time-domain analyses
and plots, samples are equivalent to milliseconds. Electrodes
FC4 and FCZ were excluded because of sporadic techni-
cal problems with the corresponding channels in the ampli-
er. The remaining 30 electrodes used in subsequent analy-
sis included bipolar VEOG and HEOG electrodes commonly
used to monitor blinks and eye movement artifacts. All other
electrodes were referenced to physically linked mastoids. We
did not employ any artifact removal or mitigation in the
present study, as we sought to measure performance with-
out the added help or complexity of artifact mitigation tech-
niques.
The BSS methods were applied to the continuously
recorded EEG data from the beginning of the rst epoch to
the end of the last. The majority of the continuous record
represented task-related activity because the intertrial period
was only approximately 30 milliseconds. We used the Matlab
implementation of Infomax available as part of the EEGLAB
software
1
[10]. The EEGLAB software rst spheres the data,
which decorrelates the channels. This simplies the ICA pro-
cedure to nding a rotation matrix which has fewer degrees
of freedom[23]. Except for the convergence criteria, all of the
default parameter values for EEGLABs Infomax algorithm
were used. Initially, extended Infomax, which allows for sub-
Gaussian as well as super-Gaussian source distributions, was
used. No sub-Gaussian sources were extracted on the rst
two subjects so the standard Infomax approach was used on
all of the subject data. An initial transformation matrix was
found with a tolerance of 0.1. The algorithm was then rerun
with this transformation matrix and a tolerance of 0.001.
To investigate whether comparing Infomax ICA and the
MNF method would be of empirical value, a simple test was
performed on the data set for several subjects. Both trans-
forms were applied to each subjects data and the resulting
components were compared. The cross-correlation for all
Infomax-MNF component pairs was computed, and the op-
timal matching was found. This matching paired the com-
ponents so that the maximal cross-correlation was achieved.
Had the components produced been the same, the cross-
correlation measure would have been 100%. Cross correla-
tions of 6070% were found in the tests performed, and so
we decided the two transforms were suciently dissimilar to
warrant the evaluation of both in the study.
Each of the original, Infomax, and MNF-transformed
data were epoched such that the one-second period begin-
ning 750 milliseconds after stimulus oset was used for sub-
sequent analysis. Because iconic memory is generally thought
to last about 500 milliseconds, this choice of temporal win-
dow should minimize the inuence of iconic memory and
place relatively more weight on active visualization processes.
We then computed spectral power for each channel (com-
ponent) and each trial (epoch) using Welchs periodogram
method that uses the average spectra from overlapping win-
dows of the epoch. We computed averaged spectral power in
the delta (24), theta (48), lower alpha (810), upper alpha
(1012), beta (1235), and gamma (3550 Hz) frequency
bands. Thus, the full feature set contains 30 electrodes 6
spectral bands each for a total of 180 features. The rst and
second trials were excluded to reduce the transient eects of
the start of the task. Thus, all subsequent analyses use 49 tri-
als of each type (yes, no) for each subject. All reported
results are for individual subjects.
2.4. Classication
In the present report, we sought subsets froma very large fea-
ture set that would maximize our ability to distinguish yes
from no trials. The distinction was tested with a support
vector machine (SVM) classier and an oversampled variant
of 10-fold cross-validation.
As discussed in the introduction, we chose a support vec-
tor machine (SVM) classier because of its record of very
1
Available from the Swartz Center for Computational Neuro-
science, University of California, San Diego, http://www.sccn.ucsd.edu/
eeglab/index.html.
good classication performance in challenging problem do-
mains and its speed of training. We used a soft margin SVM
2
with a radial basis function (RBF) kernel with = 0.1. The
SVM was trained with regularization parameter = 0.8,
which places an upper bound on the fraction of error exam-
ples and lower bound on the fraction of support vectors [44].
Given m training examples X{x
1
, . . . , x
m
} R
N
and their
corresponding class labels Y = {y
1
, . . . , y
m
} {1, 1}, the
SVM training produces nonnegative Lagrange multipliers
i
that form a linear decision boundary:
f (x) =
m
i=1
y
i
i
k
x, x
i
(1)
in the feature space
3
dened by the Gaussian kernel (of width
inversely proportional to ):
k
x, x
i
= exp
x x
i
. (2)
On each feature subset evaluation, we trained and tested the
SVM on one full run of stratied 10-fold cross-validation,
randomly selecting with replacement 10% of the trials on
each fold for testing.
2.5. Feature selection
We used a genetic algorithm (GA) to search the space of fea-
ture subsets in a wrapper fashion (see Figure 2). Individu-
als in the GA were simply bit strings of length 180, with a 1
indicating the feature was included in the subset and 0 indi-
cating it was not. Our Matlab GA implementation was based
on Goldbergs original simple GA [13], using roulette-wheel
selection and 1-point crossover. We used conventional values
for the probability of crossover (0.6) and that of mutation
(1/(4 D), where D = number of features, or 0.0014). We
evolved a population of 200 individuals over 50 generations.
Each individuals tness measure was determined by the
corresponding subsets mean classication accuracy.
We instrumented the GA with a mechanism for main-
taining information about the cumulative population, that
is, all individuals evaluated thus far. Thus, individuals that
were evaluated more than once develop a list of evaluation
measures (classication accuracies). This took advantage of
the inherent resampling that occurs in the GAbecause rela-
tively t individuals are more likely to live on and be reeval-
uated in later generations than unt individuals. Such re-
sampling, with dierent partitions of the trials into train-
ing/test sets on each new evaluation, reduces the risk of over-
tting due to selection bias. The empirical eect of this over-
sampled variant of cross-validation and its role in feature se-
lection search is illustrated in the rst part of Section 3. All
2
The SVM was implemented with version 3.00 of the OSU SVM Tool-
box for Matlab [33], which is based on version 2.33 of Dr. Chih-Jen Lins
LIBSVM.
3
Here feature space refers to the space induced by the RBF kernel, not
to be confused with the feature space, and implicit space of feature subsets,
referred to elsewhere in the manuscript.
Original
data
Infomax
MNF
Power
spectra
Features
Genetic algorithm
Feature
subset
selection
Support
vector
machine
Classication
accuracy
Feature subsets with
highest classication
accuracy
Dissociating
features
(a) (b)
Figure 2: Feature selection system architecture. Three feature families were composed with parallel and/or series execution of signal trans-
formations. Feature subsets are then evaluated with a support vector machine (SVM) classier and the space of possible feature subsets
searched by a genetic algorithm (GA) guided by the classication accuracy of the feature subsets. (a) Feature composition. (b) Feature
selection. (Adapted from [12, Figure 1].)
subsequent reports of classication accuracy use the mean of
the 10 best feature subsets that were subjected to at least ve
sample evaluations each.
3. RESULTS
3.1. Fitness evolution and overtting
at the feature selection level
Figure 3 shows how the tness of feature subsets evolves over
generations of the GA. In these and subsequent gures, the
chance level of classication accuracy (50%) is shown with
a dotted line. Note that even at the rst generation of ran-
domly selected feature subsets, the average performance of
the population is slightly above chance at 54%. This sug-
gests that, on average, randomly chosen feature subsets pro-
vide some small discriminatory information to the classier.
The approximately 70% accuracy maximum mean tness in
the rst generation of the GA represents a single sampling
of the 10-fold cross-validation. Thus, there exists a set of
10 randomly chosen training/test trial partitions for which
one of the 200 initial, randomly chosen feature subsets gave
70% classication accuracy. However, such results need to
be assessed with caution, as illustrated in the right panel of
Figure 3. Further sampling for a given feature subset (i.e.,
repetitions of a full 10-fold cross-validation) gives a more ac-
curate picture of that feature subsets ability to dissociate the
yes and no trials.
3.2. The benet of feature selection
Figure 4 shows how classication accuracy is improved when
comparing feature subsets selected by the GA with full fea-
ture sets. For every BSS transformation (original, Infomax,
and MNF) every subjects yes/no visualizations are bet-
ter distinguished with feature subsets than with the whole
feature set.
3.3. The benet of BSS transformations
Figure 5 shows for each subject how the classication ac-
curacies compare for the original signals and the two BSS
transformations. For every subject, at least one of the BSS
transformations leads to better classication accuracy than
the original signals. Spectra of Infomax and MNF transfor-
mations performed statistically signicantly better than the
spectra of the original signals for every subject except sub-
ject 1 and MNF for subject 5 (Wilcoxon rank-sum test, alpha
= 0.05). The relative performance of the three transforma-
tions does not appear to be an artifact of random processes
in the GA because it holds across two entirely separate runs
of the GA.
3.4. Intersubject variability in good feature subsets
Figure 6 shows the features selected for the feature subsets
that provided the highest classication accuracy. For both
subjects, the features include a diverse mix of electrodes and
frequency bands. Although spatial trends emerge (e.g., the
full power spectrum was included for electrodes FC3 and
CZ), no single frequency band was included across all elec-
trodes. Also, there appears to be some consistency between
subjects in terms of the selected features. Subject 1s best fea-
ture subset included 106 features and subject 6s best feature
subset included 91 features. The two subjects best subsets
had 57 features in common, including broadband features
from central and left frontocentral scalp regions.
3.5. Feature values corresponding
to the yes and no trials
Figure 7 shows the median values of the features across the
49 trials of each type for subject 6. Although a spatiospec-
tral pattern of dierences is shown in the lower part of the
gure, none of the individual features exhibited signicant
dierences between the two conditions. A few were signif-
icant at the p < 0.05 level (0.02-0.03), but certainly not
after adjusting for multiple comparisons. Some of the fea-
tures with notable dierences between yes and no were
included in subject 6s best feature subset (e.g., multiple
bands from CZ, FZ, and FC3). However, a number of such
features were not included in subject 6s best feature subset
(e.g., delta band power in P3, F7, FP2, and F8see Figure 6a
and Figure 7c).
0 10 20 30 40 50
Generation
40
45
50
55
60
65
70
75
80
C
l
a
s
s
i
c
a
t
i
o
n
a
c
c
u
r
a
c
y
(
%
)
Max
Avg
(a)
2 4 6 8 10
Number of samples
40
45
50
55
60
65
70
75
80
C
l
a
s
s
i
c
a
t
i
o
n
a
c
c
u
r
a
c
y
(
%
)
(b)
Figure 3: Feature subset evolution and overtting. (a) Mean tness of all individuals in the cumulative population as of that generation; avg
is the average and max the maximum mean tness. Data shown is for subject 6, Infomax transformation. Note that the maximum mean
tness in the cumulative population does not monotonically increase because repeated sampling of a particularly t individual may reduce
that individuals mean tness value (see (b)). (b) Mean tness of the best individual in the population for each of several dierent sampling
values. Each sample is the mean classication accuracy from a full 10-fold cross-validation run, which uses 10 randomly selected train/test
partitions of the trials for that subject. The generally decreasing function reects overtting at the feature selection level, whereby so many
feature subset evaluations occur that the system nds train/test partitions of the trials that lead to higher-than-average tness for a specic
feature subset. Additional sampling of how well that feature subset classies the data increases condence that the oversampled result is not
simply due to 10 fortuitous partitions of the trials.
4. DISCUSSION
4.1. Feature selection in the EEG-based BCI
We implemented a feature selection system for optimizing
classication in a novel, direct EEG-based BCI. For all three
representations of the signals (original, Infomax, and MNF)
and for all subjects, the GA-based search of the feature sub-
set space leads to higher classication rates than both the
full feature sets and randomly selected subsets. This indi-
cates that choosing feature subsets can improve correspond-
ing classication in an EEG-based BCI. This also indicates
that it is not simply smaller feature sets that lead to improved
classication, but the selection of specic good feature sub-
sets. Also, classication accuracy improves over generations
of the GAs feature subset search, indicating that the GAs it-
erative search process leads to improved solutions. We ran
the GA for over 700 generations for one subjects Infomax
data, and the resultant feature subsets demonstrated more
than a 14% increase in classication accuracy over that ob-
tained after just 50 generations. Although this suggests an
extensive search of the feature subset space may be bene-
cial, the roughly one week of additional computational time
may be inappropriate for some BCI research settings.
Note that, as mentioned in the introduction, there are
many ways to conduct the feature subset search and the GA
is only one family of such search methods. Sequential for-
ward (or backward) search (SFS) methods add features one
at a time but can suer from nesting wherein optimal subsets
are missed because previously good features are no longer
jointly good with other newer features and cannot be re-
moved. The same limitation applies to backward versions
of SFS that subtract single features from a full feature set.
Floating versions of these methods, sequential forward oat-
ing search (SFFS), and sequential backward oating search
All Subset
40
45
50
55
60
65
70
C
l
a
s
s
i
c
a
t
i
o
n
a
c
c
u
r
a
c
y
(
%
)
(a)
All Subset
40
45
50
55
60
65
70
C
l
a
s
s
i
c
a
t
i
o
n
a
c
c
u
r
a
c
y
(
%
)
(b)
All Subset
40
45
50
55
60
65
70
C
l
a
s
s
i
c
a
t
i
o
n
a
c
c
u
r
a
c
y
(
%
)
(c)
Figure 4: Feature subsets outperform the whole feature set across feature classes and subjects. All refers to the full set of all features, and
subset refers to the feature subsets found by the GA. Each line connects the mean classication accuracies for both cases for a single subject
for each of the (a) original, (b) Infomax, and (c) MNF transformations.
(SBFS) [41], mitigate the nesting problem by variably adding
and taking away previously added features. In principle, both
GAs and the oating methods allow for complex feature-
feature interactions. However, their migration thru the sub-
set space can dier substantially. Depending on how they are
implemented, sequential methods can implicitly assume a
certain ordering to the features, whereas GAs do not make
that assumption. Similarly, SFFS/SBFS are not as global in
their search as a GA. The oating search methods cannot
jump from one subset to a very dierent subset in a single
step as is inherent in typical GA implementations. Whether
or to what extent these dierences aect the ecacy of the
search methods depends on the problem domain and needs
to be evaluated empirically. A few investigators have com-
pared the oating search methods SFFS/SBFS to GAs for fea-
ture selection [11, 24, 31]. Kudo and Sklansky have demon-
strated that GAs outperform SFFS and SBFS when the num-
ber of features is greater than about 50 [31]. Another class of
feature selection methods is known as embedded methods.
In the embedded approach, the process of selecting features is
embedded in the use of the classier. One example is recur-
sive feature elimination (RFE) [17, 50], which has recently
been used in an EEG-based BCI [32]. RFE takes advantage
of the feature ranking inherent in using a linear SVM. How-
ever, as with other embedded approaches to feature selection,
it lacks the exibility of wrapper methods because, by def-
inition, the feature subset search cannot be separated from
the choice of classier. Feature selection research has only re-
cently begun with EEG and a comparison of feature selection
methods with EEG needs to be conducted.
We also demonstrated and addressed the issue of over-
tting at the level of feature selection. The sensitivity of any
single feature subsets performance to the specic set of 10
train/test trial partitions is a testament to the well-known but
often overlooked trial-to-trial variability of the EEG. It is also
an empirical illustration of overtting resulting from exten-
sive search of the feature subset space, also known as selec-
tion bias [43]. Our feature subset search conducts many fea-
ture subset evaluations (e.g., 200 individuals over 50 genera-
tions = 10, 000 evaluations) and there are many ways to ran-
domly choose a partition of training/test trials. Thus, there
exist 10 randomtraining/test partitions of the trials for which
specic feature subsets will do much better than average if
evaluated over other sets of 10 random train/test partitions.
Fundamentally, as more points in the feature subset space are
tested, the risk of nding fortuitous sets of train/test parti-
tions increases, so greater partition sampling is required. In
the case of a GA-based feature selection algorithm, we could
make the partition sampling dynamic by, for example, in-
creasing the amount of resampling as the GA progresses thru
generations of evolution. However, increasing the data par-
tition sampling over the course of the feature subset search
would of course slow down the system as the search pro-
gresses. Nevertheless, the GAs inherent resampling and the
1 2 3 4 5 6 7 8
Subject
45
50
55
60
65
70
C
l
a
s
s
i
c
a
t
i
o
n
a
c
c
u
r
a
c
y
(
%
)
Original
Infomax
MNF
(a)
1st 2nd
GA run
45
50
55
60
65
70
C
l
a
s
s
i
c
a
t
i
o
n
a
c
c
u
r
a
c
y
(
%
)
Original
Infomax
MNF
(b)
Figure 5: The benet of the BSS transformations and the replicability of their relative value between GA runs. (a) Mean classication accuracy
of the 10 best feature subsets with at least 5 sample evaluations. (b) The performance results for the three transformations for subject 5
over two separate runs of the GA.
O
2
O
1
O
Z
P
Z
P
4
C
P
4
P
8
C
4
T
P
8
T
8
P
7
P
3
C
P
3
C
P
Z
C
Z
F
T
8
T
P
7
C
3
F
Z
F
4
F
8
T
7
F
T
7
F
C
3
F
3
F
P
2
F
7
F
P
1
(a)
O
2
O
1
O
Z
P
Z
P
4
C
P
4
P
8
C
4
T
P
8
T
8
P
7
P
3
C
P
3
C
P
Z
C
Z
F
T
8
T
P
7
C
3
F
Z
F
4
F
8
T
7
F
T
7
F
C
3
F
3
F
P
2
F
7
F
P
1
(b)
Figure 6: Features selected in a good subset of the original spectral features and their overlap between two subjects. (a) Subject 6, (b) subject 1.
White indicates the feature was not selected, grey indicates that the feature was selected for that subject only, and black indicates the feature
was selected for both subjects.
ease with which such resampling could be implemented in a
GA provide yet another reason to use a GA for the feature
subset search in extremely noisy domains such as EEG.
How best to address the overtting issue remains an ac-
tive line of research. There are numerous data partitioning
and resampling methods such as leave-one-out or the boot-
strap. Although we partially mitigated the issue by using an
oversampled variant of cross-validation, a more principled
approach needs to be developed for highly noisy, underde-
termined problem domains. Although one should use as test
data trials unseen during the feature subset search [43], this
further exacerbates the problem of having so few trials as
is typically the case with single-session EEG experiments.
The current experiment had roughly 50 trials per condition
per subject. Although experimental sessions with many more
trials per condition raise concerns about habituation and
arousal, the benets for evaluating classiers and associated
feature selection may outweigh the disadvantages. In cases
such as the present study with a limited number of trials,
oversampling methods such as the bootstrap or the resam-
pling GA variant we used may provide a reasonable alterna-
tive to the full, nested cross-validation implied by separate
classier model selection and feature subset search.
4.2. The classier and subset search parameter space
We used only nonlinear SVMs in this study. A theoretical ad-
vantage over linear SVMs is that they can capture nonlin-
ear relationships between features and the classes. However,
O
2
O
1
O
Z
P
Z
P
4
C
P
4
P
8
C
4
T
P
8
T
8
P
7
P
3
C
P
3
C
P
Z
C
Z
F
T
8
T
P
7
C
3
F
Z
F
4
F
8
T
7
F
T
7
F
C
3
F
3
F
P
2
F
7
F
P
1
0
2
4
(a)
O
2
O
1
O
Z
P
Z
P
4
C
P
4
P
8
C
4
T
P
8
T
8
P
7
P
3
C
P
3
C
P
Z
C
Z
F
T
8
T
P
7
C
3
F
Z
F
4
F
8
T
7
F
T
7
F
C
3
F
3
F
P
2
F
7
F
P
1
0
2
4
(b)
O
2
O
1
O
Z
P
Z
P
4
C
P
4
P
8
C
4
T
P
8
T
8
P
7
P
3
C
P
3
C
P
Z
C
Z
F
T
8
T
P
7
C
3
F
Z
F
4
F
8
T
7
F
T
7
F
C
3
F
3
F
P
2
F
7
F
P
1
0.5
0
0.5
(c)
Figure 7: Median feature values for the two kinds of trials. (a) Yes, (b) no, and (c) dierence values for subject 6, original spectra features.
Bars on right show normalized spectral power (or power dierence, for yesno).
nonlinear classiers have the disadvantage that the classiers
weights do not provide a simple proxy measure of the in-
put features importance, as is the case with the linear SVM
formulation. We also used only one setting of SVM param-
eters in this study. The optimal width of the Gaussian SVM
kernel, , in particular is known to be sensitive to the clas-
siers input dimensionality (number of features). Although
we could have varied as a function of the subset size, we
explicitly chose not to. If we had varied in a principled way
(e.g., larger for fewer features), the exact formulation would
be arbitrary. If we would have conducted SVM model selec-
tion and optimized empirically, it would have introduced
another loop of cross-validation in addition to that used
to train and test the SVM for every subset evaluation. This
would not only be substantially more computationally de-
manding, but also exacerbate the risk of overtting or reduce
the amount of trials available for training/testing. In either
case, allowing to vary would introduce another variable
and we would not know whether dierences in performance
between feature subsets should be attributed to the subsets
themselves or their correspondingly tuned classier parame-
ters. Although the relative performance of the full versus par-
tial feature subsets is sensitive to , we expect that the rela-
tionship found in the present study would remain because
feature selection usually improves classication accuracy in
EEG-based BCIs. Note also that the relative performance of
feature selection using the original versus BSS-based features
was based on a consistent application of and the subsets
contained roughly equivalent numbers of features.
We also used only one setting of GA parameters in this
study. In general, one would expect that classication ac-
curacy and the feature selection process are sensitive to the
parameters used in the SVM and GA. In fact, especially in
wrapper approaches to feature selection, the classiers op-
timal parameters and optimal feature selection search algo-
rithm parameters will not be independent. In other words,
the optimal SVM model parameters will be sensitive to the
specic feature subset, and vice versa. Thus, it may be sub-
optimal to conduct the model selection separately from the
feature selection. Instead, the SVM model selection process
and the feature subset search should be conducted simulta-
neously rather than sequentially. We have recently demon-
strated this empirically with DNA microarray data [38], a
domain with noise characteristics and input dimensionality
not unlike that of EEG features. Although the SVM parame-
ters could be encoded into a bit string and optimized with a
GA in conjunction with the feature subset, the two optimiza-
tion problems are qualitatively dierent and should proba-
bly be conducted with separate mechanisms. This remains a
question for further research.
4.3. BSS in EEG-based BCI
Our results showed that the power spectra of the BSS trans-
formations provided feature subsets with higher classica-
tion accuracy than the power spectra of the original EEG sig-
nals. This improvement held for seven out of eight subjects
and was consistent across independent runs of the GA. The
results suggest that BSS transformations of EEG signals pro-
vide features with stronger dissociating power than features
based on spectral power of the original EEG signals. Infomax
and MNF diered only slightly, but both provided a marked
improvement in classication accuracy over spectral trans-
formations of the original signals. This suggests that use of
a BSS method may be more important than the choice of
specic BSS method, although further tests with other BSS
methods and other datasets would be required to substanti-
ate that interpretation.
In some EEG research using ICA, the investigator eval-
uates independent components manually. This can be con-
sidered a manual form of feature selection. However as with
the lter approach to feature selection, the features are
not selected based on their impact on the accuracy of the
nal classier in which they are used. Rather, they are se-
lected based on characteristics such as their scalp topogra-
phy, the morphology of their time course, or the variance of
the original signal for which they account. In some cases, the
decision about which features to keep is subjective. In the
present study we explicitly chose not to take this approach.
Instead, we used the wrapper approach to search the full fea-
ture set based exclusively upon the components contribu-
tion to classication. Of course, this does not preclude the
possibility that preceding automated feature selection with
a manual lter approach to feature selection would improve
overall performance. Many domains benet from the joint
application of manual and automated approaches, including
methods that do and do not leverage domain-specic knowl-
edge.
4.3.1. Good feature subsets
Subjects best feature subsets included many features from
the full feature set. We believe that this may be at least par-
tially the result of crossover in the GA, whereby new individ-
uals will tend toward having approximately half of the fea-
tures selected. The tness function used by the GA to search
the space of feature subsets used only those subsets classi-
cation accuracy. We did not use selective pressure to re-
duce the number of features in selected subsets. However,
this could be easily implemented by simply biasing the t-
ness function with a term that weights the cardinality of the
subsets under consideration. If there exist many feature sub-
sets of low cardinality that perform roughly as well as subsets
with higher cardinality, then one would generally prefer the
low-cardinality solutions because subsets with fewer features
would, in general, be easier to analyze and interpret.
Good feature subsets included a disproportionately high
representation of left frontocentral electrodes. This topogra-
phy is consistent with a role for language production, includ-
ing subvocal verbal rehearsal. It suggests that the cortical net-
works involved in rehearsing words may exhibit dissociable
patterns of activity for dierent words. The spatial informa-
tion in the EEG scalp topography is insucient to determine
whether the networks used for rehearsing the two words had
dierentiable anatomical substrates. However, such dier-
ences may be detectable with dipole analysis of high-density
EEG and/or functional neuroimaging.
We compared subjects good subsets of spectral power
based on original EEGsignals. Of the two subjects whose best
feature subsets we analyzed, approximately 60% of the in-
cluded features were common to both subjects. The common
features included several spectral bands in left frontocentral
electrodes. We did not compare subjects good subsets using
BSS-transformed EEG. One disadvantage of the BSS meth-
ods is that, because they are usually used to transform full
continuous EEG recordings on a per-subject basis, there is
no immediately apparent way to match one subjects com-
ponents with another subjects components. Although this
can be attempted manually, the process can be subjective and
problematic. Often only some of the components have sim-
ilar topographies and/or time courses between subjects, and
the degree of similarity can be quite variable. Thus it may be
dicult to compare selected features among dierent sub-
jects when the features are based on BSS transformations of
the original EEG signals.
The pattern of actual feature values was very similar for
the yes and no trials. Because both conditions involved
the same type of task, it is reasonable to assume that the as-
sociated brain activity would be similar at the level of scalp-
recorded EEG. None of the individual features diered sig-
nicantly between the two conditions. Although some of the
features with highest amplitude dierences between yes
and no were included in the best (most dissociating) fea-
ture subsets, other such features were not. At the current
point in this research, we cannot conclude whether this is be-
cause certain features were not considered in the GA-based
search, or because the interactions of certain features do bet-
ter than those single features. Evidence for or against the
former interpretation could be excluded by adding a sim-
ple per-feature test to the GAs search of the feature subset
space. Note that single features can have identical means (in-
deed, even identical distributions) for yes and no trials,
yet contribute to a feature subsets ability to dissociate the two
trial types because of class-conditional interdependencies be-
tween the features. Per-feature statistical tests, and some fea-
ture selection methods, for that matter, assume the features
are independent, ignoring any interactions among the fea-
tures. Such assumptions are generally too limiting for com-
plex, high-dimensional domains such as EEG. Besides, even
when the features are independent, there are cases when the
d best features are not the same as the best d features [9, 18].
4.4. BCI application relevance
Our BCI task design provides a native interface for a patient
without any motor control to directly respond yes or no
to questions [35]. The paradigm provides a good model for
a BCI setting in which the caregiver initiates dialog with the
patient. Furthermore, it avoids the indirect mappings and ex-
tensive biofeedback training required in other BCI designs.
However, this direct task design has some clear limitations.
First, we do not have any control over what the subject is
doing when they are supposed to be visualizing the word.
The subjects could have been daydreaming on some trials
or, perhaps even worse, still visualizing the word from an
earlier trial. Of course this would degrade classication ac-
curacy and may be a more severe problem for neurologi-
cally impaired patients compared to the healthy, albeit per-
haps less motivated, volunteers we used. Second, even if sub-
jects are performing the task as instructed, dierent subjects
may use dierent cognitive processes with correspondingly
dierent neural substrates. For example, subjects that main-
tain visualizations close to the original percept will recruit
relatively more early visual system activity (e.g., in occipi-
tal/temporal areas), whereas subjects that maintain the word
in a form of working memory will probably recruit the front-
temporal components of the phonological loop. These two
systems involve cortico-cortical and thalamocortical loops
producing dierent changes in oscillatory electrophysiology
usually manifest as changes in gamma and theta/alpha bands,
respectively. Thus, the spectral and topographic features that
best distinguish the yes/no responses will most likely vary per
subject. Indeed, this is one of the biggest motivations for tak-
ing a feature selection approach to EEG-based BCIs and con-
ducting the feature selection search on a strictly per-subject
basis as we did in the present study. Third, and perhaps most
notably, the classication accuracy is far below that obtained
in studies using indirect approaches. Nothing about our
approach precludes having more than one session and there-
fore many more trials with which to learn good feature sub-
sets and improve classication accuracy. Also, although indi-
rect approaches will probably continue to provide high clas-
sication accuracy (and therefore a generally higher bit rate)
for the near future, advances in basic cognitive psychology
and cognitive neuroscience may provide more clues about
what might be good EEG features to use to distinguish di-
rect commands such as visualizing or imagining yes/no or
on/o responses. In the meantime, BSS transformations and
feature selection may provide moderate classication perfor-
mance in direct BCIs and even help inform basic scientists
about the EEG features on which to focus their research.
Our approach to feature selection is amenable to the de-
velopment of on-line BCI applications. One could use the full
system, including the GA, to learn o-line the best feature
subset for a given subject and task, then use the trained SVM
with that feature subset and without the GAin an on-line set-
ting. Dynamic adjustments to the optimal feature subset can
be continuously identied o-line and reincorporated into
the on-line system. Also, as suggested in the results, the best
feature subset may include features from only a small sub-
set of electrodes. The potentially much smaller number of
electrodes could be applied to the subject, reducing appli-
cation time and the risk of problematic electrodes for easier
on-line use of the BCI. Although we intentionally used a de-
sign without biofeedback, one could supplement this design
with feedback. Other groups have found that incorporation
of feedback can be used to increase classication accuracy.
Feature selection could provide guidance on which features
are most signicant for dissociating classes of EEG trials, and
therefore one source of guidance for choice of information to
use in the feedback signals provided to the subject.
5. CONCLUSION
Signal processing and machine learning can be used to en-
hance classication accuracy in BCIs where a priori infor-
mation about dissociable brain activity patterns does not ex-
ist. In particular, blind source separation of the EEG sig-
nals prior to their spectral power transformation leads to in-
creased classication accuracy. Also, even sophisticated clas-
siers like a support vector machine can benet from the use
of specic feature subsets rather than the full set of possi-
ble features. Although the search for feature subsets exac-
erbates the risk that the classier will overt the trials used
to train the BCI, a variety of methods exist for mitigating
that risk and can be assessed over the course of feature sub-
set search. Feature selection is a particularly promising line
of investigation for signal processing in BCIs because it can
be used o-line to nd the subject-specic features that can
be used for optimal on-line performance.
ACKNOWLEDGMENTS
The authors thank three anonymous reviewers for many
helpful comments on the original manuscript, Dr. Carol
Seger for use of Psychology Department EEG Laboratory re-
sources, and Darcie Moore for assistance with data collec-
tion. Partial support provided by Colorado Commission on
Higher Education Center of Excellence Grant to Michael H.
Thaut and National Science Foundation Grant 0208958 to
Charles W. Anderson and Michael J. Kirby.
REFERENCES
[1] M. G. Anderle and M. J. Kirby, An application of the
maximum noise fraction method to ltering noisy time-
series, in Proc. 5th International Conference on Mathematics
in Signal Processing, University of Warwick, Coventry, UK,
2001.
[2] C. W. Anderson and M. J. Kirby, EEG subspace represen-
tations and feature selection for brain-computer interfaces,
in Proc. 1st IEEE Conference on Computer Vision and Pat-
tern Recognition Workshop for Human Computer Interaction
(CVPRHCI 03), vol. 5, Madison, Wis, USA, June 2003.
[3] F. Babiloni, F. Cincotti, L. Lazzarini, et al., Linear classica-
tion of low-resolution EEG patterns produced by imagined
hand movements, IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp.
186188, 2000.
[4] A. J. Bell and T. J. Sejnowski, An information-maximization
approach to blind separation and blind deconvolution, Neu-
ral Computation, vol. 7, no. 6, pp. 11291159, 1995.
[5] N. Birbaumer, A. Kubler, N. Ghanayim, et al., The thought
translation device (TTD) for completely paralyzed patients,
IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 190193, 2000.
[6] B. Blankertz, G. Curio, and K.-R. Muller, Classifying sin-
gle trial EEG: towards brain computer interfacing, in Neu-
ral Information Processing Systems (NIPS 01), T. G. Diettrich,
S. Becker, and Z. Ghahramani, Eds., vol. 14, Vancouver, BC,
Canada, pp. 157164, December 2001.
[7] A. L. Blum and P. Langley, Selection of relevant features and
examples in machine learning, Articial Intelligence, vol. 97,
no. 1-2, pp. 245271, 1997.
[8] M. P. S. Brown, W. N. Grundy, D. Lin, et al., Knowledge-
based analysis of microarray gene expression data by us-
ing support vector machines, Proceedings of the National
Academy of Sciences of the United States of America, vol. 97,
no. 1, pp. 262267, 2000.
[9] T. M. Cover, The best two independent measurements are
not the two best, IEEE Trans. Syst., Man, Cybern., vol. 4, no. 1,
pp. 116117, 1974.
[10] A. Delorme and S. Makeig, EEGLAB: an open source toolbox
for analysis of single-trial EEG dynamics including indepen-
dent component analysis, Journal of Neuroscience Methods,
vol. 134, no. 1, pp. 921, 2004.
[11] F. Ferri, P. Pudil, M. Hatef, and J. Kittler, Comparative study
of niques for large scale feature selection, in Pattern Recogni-
tion in Practice IV: Multiple Paradigms, Comparative Studies,
and Hybrid Systems, E. S. Gelsema and L. N. Kanal, Eds., pp.
403413, Vlieland, The Netherlands, June 1994.
[12] D. Garrett, D. A. Peterson, C. W. Anderson, and M. H. Thaut,
Comparison of linear, nonlinear, and feature selection meth-
ods for EEG signal classication, IEEE Transactions on Neural
Systems and Rehabilitation Engineering, vol. 11, no. 2, pp. 141
144, 2003.
[13] D. E. Goldberg, Genetic Algorithms in Search, Optimization,
and Machine Learning, Addison Wesley, Reading, Mass, USA,
1989.
[14] A. A. Green, M. Berman, P. Switzer, and M. D. Craig, A trans-
formation for ordering multispectral data in terms of im-
age quality with implications for noise removal , IEEE Trans.
Geosci. Remote Sensing, vol. 26, no. 1, pp. 6574, 1988.
[15] C. Guerra-Salcedo and D. Whitley, Genetic approach to fea-
ture selection for ensemble creation, in Proc. Genetic and Evo-
lutionary Computation Conference (GECCO99), pp. 236243,
Orlando, Fla, USA, July 1999.
[16] I. Guyon and A. Elissee, An introduction to variable
and feature selection, Journal of Machine Learning Research,
vol. 3, no. 7-8, pp. 11571182, 2003.
[17] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene selec-
tion for cancer classication using support vector machines,
Machine Learning, vol. 46, no. 1-3, pp. 389422, 2002.
[18] D. J. Hand, Discrimination and Classication, John Wiley &
Sons, New York, NY, USA, 1981.
[19] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of
Statistical Learning: Data Mining, Inference, and Prediction,
Springer, New York, NY, USA, 2001.
[20] J. H. Holland, Adaptation in Natural and Articial Systems,
University of Michigan Press, Ann Arbor, Mich, USA, 1975.
[21] D. R. Hundley, M. J. Kirby, and M. Anderle, Blind source sep-
aration using the maximum signal fraction approach, Signal
Processing, vol. 82, no. 10, pp. 15051508, 2002.
[22] A. Hyv arinen, J. Karhunen, and E. Oja, Independent Compo-
nent Analysis, John Wiley & Sons, New York, NY, USA, 2001.
[23] A. Hyv arinen and E. Oja, Independent component analysis:
algorithms and applications, Neural Networks, vol. 13, no. 4-
5, pp. 411430, 2000.
[24] A. Jain and D. Zongker, Feature selection: evaluation, appli-
cation, and small sample performance, IEEE Trans. Pattern
Anal. Machine Intell., vol. 19, no. 2, pp. 153158, 1997.
[25] T. Joachims, Text categorization with support vector ma-
chines, in Proc. 10th European Conference on Machine Learn-
ing (ECML 98), pp. 137142, Chemnitz, Germany, April
1998.
[26] T.-P. Jung, S. Makeig, M. J. McKeown, A. J. Bell, T.-W. Lee,
and T. J. Sejnowski, Imaging brain dynamics using indepen-
dent component analysis, Proc. IEEE, vol. 89, no. 7, pp. 1107
1122, 2001.
[27] T. P. Jung, S. Makeig, M. Westereld, J. Townsend, E. Courch-
esne, and T. J. Sejnowski, Analysis and visualization of single-
trial event-related potentials, Human Brain Mapping, vol. 14,
no. 3, pp. 166185, 2001.
[28] M. J. Kirby and C. W. Anderson, Geometric analysis for the
characterization of nonstationary time-series, in Perspectives
and Problems in Nonlinear Science: A Celebratory Volume in
Honor of Larry Sirovich, E. Kaplan, J. Marsden, and K. R.
Sreenivasan, Eds., chapter 8, Springer Applied Mathematical
Sciences Series, Springer, New York, NY, USA, pp. 263292,
March 2003.
[29] J. N. Knight, Signal Fraction Analysis and Artifact Removal in
EEG, Department of Computer Science, Colorado State Uni-
versity, Fort Collins, Colo, USA, 2003.
[30] R. Kohavi and G. H. John, Wrappers for feature subset se-
lection, Articial Intelligence, vol. 97, no. 1-2, pp. 273324,
1997.
[31] M. Kudo and J. Sklansky, Comparison of algorithms that
select features for pattern classiers, Pattern Recognition,
vol. 33, no. 1, pp. 2541, 2000.
[32] T. N. Lal, M. Schroder, T. Hinterberger, et al., Support vector
channel selection in BCI, IEEE Trans. Biomed. Eng., vol. 51,
no. 6, pp. 10031010, 2004.
[33] J. Ma, Y. Zhao, and S. Ahalt, OSU SVM Classier Matlab Tool-
box, Ohio State University, Columbus, Ohio, USA, 2002.
[34] S. Makeig, M. Westereld, T.-P. Jung, et al., Functionally in-
dependent components of the late positive event-related po-
tential during visual spatial attention, The Journal of Neuro-
science, vol. 19, no. 7, pp. 26652680, 1999.
[35] L. A. Miner, D. J. McFarland, and J. R. Wolpaw, Answer-
ing questions with an electroencephalogram-based brain-
computer interface, Archives of Physical Medicine and Reha-
bilitation, vol. 79, no. 9, pp. 10291033, 1998.
[36] P. L. Nunez, R. Srinivasan, A. F. Westdorp, et al., EEG co-
herency I: statistics, reference electrode, volume conduction,
Laplacians, cortical imaging, and interpretation at multiple
scales, Electroencephalography and Clinical Neurophysiology,
vol. 103, no. 5, pp. 499515, 1997.
[37] W. D. Penny, S. J. Roberts, E. A. Curran, and M. J.
Stokes, EEG-based communication: a pattern recognition
approach, IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 214215,
2000.
[38] D. A. Peterson and M. H. Thaut, Model and feature selec-
tion in microarray classication, in Proc. IEEE Symposium
on Computational Intelligence in Bioinformatics and Compu-
tational Biology (CIBCB 04), pp. 5660, La Jolla, Calif, USA,
October 2004.
[39] G. Pfurtscheller, C. Neuper, C. Guger, et al., Current trends
in Graz brain-computer interface (BCI) research, IEEE Trans.
Rehab. Eng., vol. 8, no. 2, pp. 216219, 2000.
[40] M. Pontil and A. Verri, Support vector machines for 3D ob-
ject recognition, IEEE Trans. Pattern Anal. Machine Intell.,
vol. 20, no. 6, pp. 637646, 1998.
[41] P. Pudil, J. Novovicova, and J. Kittler, Floating search meth-
ods in feature selection, Pattern Recognition Letters, vol. 15,
no. 11, pp. 11191125, 1994.
[42] B. Raman and T. R. Ioerger, Enhancing learning using feature
and example selection, Tech. Rep. Departement of Computer
Science, Texas A & M University, College Station, Tex, USA.
[43] J. Reunanen, Overtting in making comparisons between
variable selection methods, Journal of Machine Learning Re-
search, vol. 3, no. 7-8, pp. 13711382, 2003.
[44] A. J. Smola, B. Sch olkopf, R. C. Williamson, and P. L.
Bartlett, New support vector algorithms, Neural Computa-
tion, vol. 12, no. 5, pp. 12071245, 2000.
[45] B. Sch olkopf, C. J. C. Burges, and A. J. Smola, Eds., Advances
in Kernel Methods: Support Vector Learning, MIT Press, Cam-
bridge, Mass, USA, 1999.
[46] W. Siedlecki and J. Sklansky, A note on genetic algorithms
for large-scale feature selection, Pattern Recognition Letters,
vol. 10, no. 5, pp. 335347, 1989.
[47] A. C. Tang and B. A. Pearlmutter, Independent components
of magnetoencephalography: localization and single-trial re-
sponse onset detection, in Magnetic Source Imaging of the
Human Brain, L. Kaufman and Z. L. Lu, Eds., pp. 159201,
Lawrence Erlbaum Associates, Mahwah, NJ, USA, 2003.
[48] V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons,
[49] R. Vigario, J. Sarela, V. Jousmaki, M. Hamalainen, and E. Oja,
Independent component approach to the analysis of EEG
and MEGrecordings, IEEE Trans. Biomed. Eng., vol. 47, no. 5,
pp. 589593, 2000.
[50] J. Weston, A. Elissee, B. Sch olkopf, and M. E. Tipping, Use
of the zero-norm with linear models and kernel methods,
Journal of Machine Learning Research, vol. 3, no. 7-8, pp.
14391461, 2003.
[51] L. D. Whitley, J. R. Beveridge, C. Guerra-Salcedo, and C. R.
Graves, Messy genetic algorithms for subset feature selec-
tion, in Proc. 7th International Conference on Genetic Algo-
rithms (ICGA 97), T. Baeck, Ed., pp. 568575, Morgan Kauf-
mann, East Lansing, Mich, USA, July 1997.
[52] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller,
and T. M. Vaughan, Brain-computer interfaces for com-
munication and control, Clinical Neurophysiology, vol. 113,
no. 6, pp. 767791, 2002.
[53] J. R. Wolpaw, D. J. McFarland, G. W. Neat, and C. A. Forneris,
An EEG-based brain-computer interface for cursor control,
Electroencephalography and Clinical Neurophysiology, vol. 78,
no. 3, pp. 252259, 1991.
[54] J. R. Wolpaw, D. J. McFarland, and T. M. Vaughan, Brain-
computer interface research at the Wadsworth center, IEEE
[55] J. Yang and V. Honavar, Feature subset selection using a ge-
netic algorithm, in Feature Extraction, Construction and Se-
lection: A Data Mining Perspective, H. Liu and H. Motoda,
Eds., pp. 117136, Kluwer Academic, Boston, Mass, USA,
1998.
[56] E. Yom-Tov and G. F. Inbar, Feature selection for the classi-
cation of movements from single movement-related poten-
tials, IEEE Transactions on Neural Systems and Rehabilitation
Engineering, vol. 10, no. 3, pp. 170177, 2002.
David A. Peterson is a Ph.D. candidate in
the Computer Science Department at Col-
orado State University (CSU) and part of
the Cognitive Neuroscience Group aliated
with CSUs Center for Biomedical Research
in Music. He received a B.S. degree in elec-
trical engineering and a B.S. degree in -
nance from the University of Colorado at
Boulder. He did business data network con-
sulting for Accenture (previously Andersen
Consulting) prior to returning to academia. His research is on
biomedical applications of machine learning, with an emphasis on
classication and feature selection. He has published research in
areas as diverse as mammalian taste coding, brain oscillations as-
sociated with working memory, and the interaction of model and
feature selection in microarray classication. His current interests
are in cognitive, EEG-based brain-computer interfaces and the in-
uence of rhythmic musical structure on the electrophysiology of
verbal learning.
James N. Knight is currently a Ph.D. stu-
dent at Colorado State University. He re-
ceived his M.S. degree in computer sci-
ence fromColorado State University and his
B.S. degree in math and computer science
from Oklahoma State University. His re-
search areas include signal processing, rein-
forcement learning, high-dimensional data
modeling, and the application of Markov
chain Monte Carlo methods to problems in
surface chemistry.
Michael J. Kirby received the B.S. degree in
mathematics fromMIT (1984), the M.S. de-
gree (1986) and Ph.D. degree (1988) both
from the Division of Applied Mathematics,
Brown University. He joined Colorado State
University in 1989 where he is currently a
Professor of mathematics and computer sci-
ence. He was an Alexander Von Humboldt
Fellow (19891991) at the Institute for In-
formation Processing, University of Tuebin-
gen, Germany, and received an Engineering and Physical Sciences
Research Council (EPSRC) Visiting Research Fellowship (1996). He
received an IBM Faculty Award (2002) and the Colorado State Uni-
versity, College of Natural Sciences Award for Graduate Student Ed-
ucation (2002). His interests are in the area geometric methods for
modeling large data sets including algorithms for the representa-
tion of data on manifolds and data-driven dimension estimation.
He has published widely in this area including the textbook Geo-
metric Data Analysis (2001), Wiley & Sons.
Charles W. Anderson received the B.S. de-
gree in computer science in 1978 from the
University of Nebraska, and the M.S. and
Ph.D. degrees in computer science in 1982
and 1986, respectively, from the Univer-
sity of Massachusetts, Amherst. From 1986
through 1990, he was a Senior Member of
Technical Sta at GTE Labs in Waltham,
Mass. He is now an Associate Professor in
the Department of Computer Science at
Colorado State University in Fort Collins, Colo. His research in-
terests are in neural networks for signal processing and control.
Specically, he is currently working with medical signals and im-
ages and with reinforcement learning methods for the control of
heating and cooling systems. Additional information can be found
at http://www.cs.colostate.edu/anderson.
Michael H. Thaut is a Professor of neurosciences and the Chair
of the Department of Music, Theatre, and Dance at Colorado
State University. He is also the Head of the Center for Biomedi-
cal Research in Music. His research focuses on rhythm perception
and production and its application to movement rehabilitation in
trauma, stroke, and Parkinsons patients. Recent expansion of his
research agenda includes applications of the rhythmic structure of
music to cognitive rehabilitation in multiple sclerosis. He received
his Ph.D. degree in music fromMichigan State University and holds
degrees in music from the Mozarteum in Salzburg, Austria, and
psychology from Muenster University in Germany. He has served
as a Visiting Professor of kinesthesiology at the University of Michi-
gan, a Visiting Scientist at Duesseldorf University Medical School,
and a Visiting Professor at Heidelberg University. The author and
coauthor of primary textbooks in music therapy, his works have ap-
peared in English, German, Italian, Spanish, Korean, and Japanese.
A Time-Frequency Approach to Feature Extraction
for a Brain-Computer Interface with a Comparative
Analysis of Performance Measures
Damien Coyle
Intelligent Systems Engineering Laboratory, School of Computing and Intelligent Systems, Faculty of Engineering,
University of Ulster, Magee Campus, Derry BT48 7JL, UK
Email: dh.coyle@ulster.ac.uk
Girijesh Prasad
Email: g.prasad@ulster.ac.uk
T. M. McGinnity
Email: tm.mcginnity@ulster.ac.uk
Received 2 February 2004; Revised 4 October 2004
The paper presents an investigation into a time-frequency (TF) method for extracting features from the electroencephalogram
(EEG) recorded from subjects performing imagination of left- and right-hand movements. The feature extraction procedure
(FEP) extracts frequency domain information to form features whilst time-frequency resolution is attained by localising the fast
Fourier transformations (FFTs) of the signals to specic windows localised in time. All features are extracted at the rate of the
signal sampling interval from a main feature extraction (FE) window through which all data passes. Subject-specic frequency
bands are selected for optimal feature extraction and intraclass variations are reduced by smoothing the spectra for each signal by
an interpolation (IP) process. The TF features are classied using linear discriminant analysis (LDA). The FE window has potential
advantages for the FEP to be applied in an online brain-computer interface (BCI). The approach achieves good performance when
quantied by classication accuracy (CA) rate, information transfer (IT) rate, and mutual information (MI). The information
that these performance measures provide about a BCI system is analysed and the importance of this is demonstrated through the
results.
Keywords and phrases: brain-computer interface, neuromuscular disorders, electroencephalogram, time-frequency methods, lin-
ear classication.
1. INTRODUCTION
Nearly two million people in the United States [1] are af-
fected by neuromuscular disorders. A conservative estimate
of the overall prevalence is that 1 in 3500 of the worlds pop-
ulation may be expected to have a disabling inherited neu-
romuscular disorder presenting in childhood or in later life
[2]. In many cases those aected may have no control over
muscles that would normally be used for communication.
BCI technology is a developing technology but has the po-
tential to contribute to the improvement of living standards
for these people by oering an alternative communication
channel which does not depend on the peripheral nerves or
muscles [3]. A BCI replaces the use of nerves and muscles
and the movements they produce with electrophysiological
signals in conjunction with the hardware and software that
translate those signals into actions [1].
A BCI involves extracting information from the highly
complex EEG. This is usually achieved by extracting features
from EEG signals recorded from subjects performing specic
mental tasks. Aclass of features for each mental task is usually
obtained from signals, prerecorded whilst a subject performs
a number of repetitions of each mental task. Subsequently a
classier is trained to learn which features belong to which
class. This ultimately leads to the development of a BCI sys-
temthat can determine which mental tasks are related to spe-
cic EEG signals [4] and associate those EEG signals with the
users intended communication.
This work demonstrates the use of the short time Fourier
transform (STFT) to extract reliable features from EEG sig-
nals altered by imagined right/left-hand movements. EEG
data was recorded from two recording sites on the scalp
positioned at C3 and C4 [5] over the motor cortex. The
STFT is used to calculate frequency spectra from a win-
dow (i.e., STFT-window) which slides along the data con-
tained within another window (i.e., the feature extraction
(FE) window). All EEG data recorded from each record-
ing site is passed through the FE window. The spectra are
smoothed using an interpolation (IP) process. Features are
obtained from each interpolated spectrum by calculating the
norm of the power in predetermined subject-specic fre-
quency bands. Linear discriminant analysis (LDA) is used for
classication and system performance is quantied based on
three performance measures. The measurement of BCI per-
formance is very important for comparing dierent systems
and measuring improvements in systems. There are a num-
ber of techniques used to quantify the eectiveness and per-
formance of a BCI system. These include measuring the clas-
sication accuracy (CA) and/or measuring the information
transfer (IT) rate. The latter performance quantier takes
into consideration the CA and the time (CT) required to
perform classication of each mental task. A third and rel-
atively new quantier of performance for a BCI system is
to quantify the mutual information (MI) which is a mea-
sure of the average amount of information a classier out-
put contains about the input signal [6, 7]. A critical analy-
sis of the performance measures, illustrating the advantages
of utilising each one for evaluating a BCI system, is pro-
vided.
The performance of the system is dependent upon
choices of parameter combinations. It is shown that the
width of the main FE window, the number of STFT win-
dows, the width and length of the STFT windows, and the
amount of overlap between consecutive STFT-windows all
have signicant aects on the performance of the system. An
interpolation process for smoothing the frequency spectra
improves the features and helps increase CA rates. The im-
portance of each parameter is analysed. The results demon-
strate that, to obtain the best performance, the parameter
combinations have to be optimised individually for each sub-
ject. However, a number of parameters converge to similar
values, therefore there may exist a particular parameter com-
bination that would generalise well to all subjects and thus
potentially simplify the application of the system to each in-
dividual subject. Details on these aspects of the system, along
with a comparison to other BCI systems, are discussed.
The paper is organised in 11 sections. Section 2 describes
the data acquisition procedure. Section 3 introduces the
STFT and the FEP and Section 4 provides an analysis of the
EEG used in this work. Sections 5 and 6 describe the FEP and
the classication procedures, respectively. Section 7 describes
briey three methods for quantifying the performance of a
BCI system. Section 8 outlines the system optimisation pro-
cedure. Sections 9 and 10 document and discuss the results.
Section 11 concludes the paper.
2. DATA ACQUISITION
The EEG data used to demonstrate this approach was
recorded by the Graz BCI research group (see acknowl-
edgement) [8, 9, 10, 11]. The Graz group has developed a
BCI which uses (812 Hz) and central (1825 Hz) EEG
rhythms recorded over the motor cortex. Several factors have
suggested that and/or rhythms may be good signal fea-
tures for EEG-based communication. These signals are as-
sociated with those cortical areas most directly connected
to the brains normal motor output channels [1]. The data
was recorded from 3 subjects (S1, S2, and S3) over two ses-
sions, in a timed experimental recording procedure. Each
trial was 8 s length. The rst 2 s was quiet, at t = 2 s an
acoustic stimulus signies the beginning of a trial, and a
cross + was displayed for 1 s, then at t = 3 s, an arrow
(left or right) was displayed as cue. At the same time the
subject was asked to move a bar in the direction of the cue
by imagining moving the left or right hand. The feedback
(bar movement) can help the user learn to control their EEG
better for specic tasks. For subject S1 a total of 280 trials
were recorded (140 trials of each type of movement imagery).
For the subject S2 there were 320 trials (160 trials of each
type of movement imagery). The recording was made using a
g.tec amplier (http://www.gtec.at/) and Ag/AgCl electrodes.
All signals were sampled at 128 Hz and ltered between 0.5
and 30 Hz. Two bipolar EEG channels were measured using
two electrodes positioned 2.5 cm posterior () and ante-
rior (+) to position C3 and C4 according to the interna-
tional standard (10/20 system) electrode positioning nomen-
clature. In bipolar recording the recorded voltage is the volt-
age dierence between the anterior and posterior electrode
at each recording site. A detailed description of similar ex-
perimental setups for recording these EEG signals is available
[6, 8, 9, 10, 11, 12].
3. THE FE WINDOWANDTHE STFT WINDOW
In this investigation there are two windows utilisedthe FE
window and the STFT window. EEG signals (or data) are
fed through the FE window and within the FE window the
frequency components of the EEG signal are obtained us-
ing a fast Fourier transform (FFT). Within the FE window
a temporal resolution is attained by sliding the STFT win-
dow along the data sequence with a certain overlap. This
windowed signal processing technique is often referred to as
the Gabor transform after Gabor (1946). STFT analysis of a
nonstationary signal assumes stationarity over the selected
signal segment (the STFT window). The inherent assump-
tion of stationarity over the STFT window can lead to smear-
ing in the frequency domain and decreased frequency reso-
lution when analysing EEG signals with fast changing spec-
tral content [13]. The temporal resolution can be made as
high as possible by sliding the STFT window along the FE
window with a large overlap. This maximises the potential
for identifying short events that occur within the FE window
[14].
A Time-Frequency Approach to Feature Extraction for a BCI 3143
Main FE
window,
length = M
STFT window,
length = N
STFT
window 1
STFT
window 2
Overlap
ovl
C3
C4
Incoming
signals
Processed
signal (OLD)
FE
T
Figure 1: Illustration of FE window and STFT window in the FEP.
To localise the Fourier transform of the signal at time in-
stant which falls within the main FE window, the STFT-
window function is peaked around and falls o, thus em-
phasising the signal in the vicinity of time and suppress-
ing it for distant times [15]. There are a number of windows
which can be used for achieving these characteristics. Gabor
proposed the use of a Gaussian window formulated as fol-
lows:
w(t) = e
(1/2)((tN/2)/(N/2))
2
, (1)
where 0 t < N and is the reciprocal of the standard de-
viation. The width of the window is inversely related to the
value of ; a larger produces a narrower window. The win-
dow has the length N. These constant parameters denote the
length of the window and the degree of localisation in the
time domain, respectively [15]. The tuning of these param-
eters is very important for the extraction of features used in
this approach and this is made apparent in the results section.
The ordinary Fourier transform (FT) is based on com-
paring the signal with complex sinusoids that extend through
the whole time domain; its main disadvantage is the lack of
information about the time evolution of the frequencies. In
this case, if an alteration occurs at some time boundary, the
whole Fourier spectrumwill be aected [15]. The FTrequires
stationarity of the signal which is a disadvantage in EEGanal-
ysis, the EEG signal being highly nonstationary. The STFT
helps to overcome many of these disadvantages and is for-
mulated as follows:
Y
k
( f , ) =
+N/2
i=N/2
w
(t )y
k
(t)e
j
2
N
f t
, (2)
where f = 0, 1, . . . ,N
f
1. N
f
is the number of frequency
points or Fourier transforms. Y
k
( f , ) contains the frequency
spectrum for each STFT window centred at . y
k
is the input
EEG signal (i.e., either C3 (k = 1) or C4 (k = 2)) contained
within the main FE window. The number of STFT windows
used to analyse the data contained in the FE window depends
on the length of the FE window M, the STFT window length
N, and the amount of overlap, ovl, between adjacent STFT
windows. M must always be larger than N. Y
k
is a matrix
with N
f
rows and E = (Movl)/(N ovl) columns (i.e., the
rows contain the power of the signal for each harmonic and
E is the number of STFT windows that are produced within
the FE window).
This analysis was carried out oine although, to approx-
imate the online capabilities, all features are extracted within
the FE window so that features can be extracted at the rate of
the sampling interval as data passes through the window. As
each newsignal sample enters the FE window, the oldest sam-
ple is removed and the STFT window slides along the signal
within the FE window (this process is repeated as each new
sample enters the FE window). A frequency spectrum is cal-
culated for each STFT window centred at . An illustration
of the FE window and STFT window is shown in Figure 1.
This illustration shows two STFT-windows contained within
the FE window for each signal (C3 or C4).
4. SPECTRAL ANALYSIS ANDERD/ERS
The spectra of signals recorded from recording sites C3 and
C4 when subjects perform imagination of hand movements
usually show an increase and decrease in the intensity of fre-
quencies in the (812) and central (1825) ranges, de-
pending on the recording location and the imagined hand
movement (left or right). When certain cortical areas, such as
the sensorimotor area, become activated during the course of
information processing, amplitude attenuation occurs in the
oscillations of the and central rhythms. This is known
as an event-related desynchronisation (ERD). An amplitude
enhancement or event-related synchronisation (ERS) can be
observed in cortical areas that are not specically engaged in
a given mode of activity at a certain moment of time [9, 11].
The location and frequency ranges of ERS/ERD are subject-
specic.
0 10 20 30 40 50 60 70
0
5
10
15
20
Frequency (Hz)
A
m
p
l
i
t
u
d
e
Raw spectrum
0 10 20 30 40 50 60 70
0
5
10
15
20
Frequency (Hz)
A
m
p
l
i
t
u
d
e
Raw spectrum
Figure 2: C3 Left (windows 1 & 2).
0 10 20 30 40 50 60 70
0
5
10
15
20
Frequency (Hz)
A
m
p
l
i
t
u
d
e
Raw spectrum
0 10 20 30 40 50 60 70
0
5
10
15
20
Frequency (Hz)
A
m
p
l
i
t
u
d
e
Raw spectrum
Figure 3: C4 Left (windows 1 & 2).
Figures 2, 3, 4, and 5 show a typical set of frequency
spectra. Figures 2 and 3 are obtained from calculating the
STFT from EEG signals recorded during imagination of left-
hand movement. Figures 4 and 5 were obtained from signals
recorded during imagination of right-hand movement. For
this analysis only two windows were used for each signal.
The top graph in each gure is the spectrum of the rst win-
dow and the bottom is the spectrum calculated for the sec-
ond window. The dominant frequency components can be
observed from each graph.
0 10 20 30 40 50 60 70
0
5
10
15
20
Frequency (Hz)
A
m
p
l
i
t
u
d
e
Raw spectrum
0 10 20 30 40 50 60 70
0
5
10
15
20
Frequency (Hz)
A
m
p
l
i
t
u
d
e
Raw spectrum
Figure 4: C3 Right (windows 1 & 2).
0 10 20 30 40 50 60 70
0
5
10
15
20
Frequency (Hz)
A
m
p
l
i
t
u
d
e
Raw spectrum
0 10 20 30 40 50 60 70
0
5
10
15
20
Frequency (Hz)
A
m
p
l
i
t
u
d
e
Raw spectrum
Figure 5: C4 Right (windows 1 & 2).
Both spectra in Figure 5 (C4 recording right signal) show
strong evidence of (812 Hz) rhythm. This is not observ-
able from Figure 4 (C3 recording right signal) which suggests
that there is an ERDof the rhythmon the contra lateral side
(opposite side to imagined hand movement). ERD can be
interpreted as an electrophysiological correlate of activated
cortical areas involved in processing of sensory or cognitive
information or production of motor behaviour (see [16]). A
small peak can be observed within the central (between 18
and 20 Hz) rhythm on the C4 spectral plots which suggests
that there is an ERS in the central rhythm on ipsilateral
hemisphere. The large peak on the rhythm on the C4 elec-
trode is an electrophysiological correlate of cortical areas at
rest or the cooperative or synchronised behaviour of a large
number of neurons. Similar contralateral-ipsilateral dier-
ences occur during the imagination of left-hand movement,
except the dierences are symmetrically reversed. To deter-
mine that events are truly event related, an experiment de-
scribed in [16], which involves averaging spectra, is the stan-
dard approach for distinguishing ERS/ERD in EEG signals.
The rhythm and the central rhythm were selected as
the most reactive frequency bands from which to extract fea-
tures, for all subjects analysed. There are subtle dierences in
the main peaks of the upper and lower graphs in each gure
indicating that throughout the imagination there is a change
in the amplitude and degree of ERS/ERD in the signals. The
evolution of the frequency over time within the FE window
can be observed more closely by using an increased num-
ber of STFT windows with smaller length. Also, motor im-
agery data becomes most separable at specic segments (sub-
ject specic) [10], therefore if the FE window length M is se-
lected properly the segments of data that produce maximum
feature separability are captured as they pass through the
window. The best FE window width M is subject specic. M
must be selected empirically for each subject. If the STFT-
window parameters are selected properly, then feature sepa-
rability can be maximised within the FE window.
5. INTERPOLATION-BASEDFEATURE
EXTRACTIONPROCEDURE
The extracted spectra contain quite a lot of detail on the fre-
quencies that are not as prominent as those in and around the
and central ranges. Smoothing the spectra can reduce fea-
ture quality degradation caused by irregular frequency com-
ponents introduced by noise and help compensate for miss-
ing information. The spectrum shape can be smoothed by
decreasing the width of the STFT-window (i.e., increasing
of (1)). If the window is too narrow, the frequency resolution
will be poor, and if the window is too wide, the time localiza-
tion will not be so precise. This means that sharp localiza-
tions in time and frequency are mutually exclusive because
a frequency cannot be calculated instantaneously [15]. De-
pending on the application and the quantity of information
required about the frequency components the choice of win-
dow and window parameters must be adjusted to obtain the
desired resolution. For this approach a good frequency reso-
lution is important especially in the and ranges but the
objective is to obtain features which can provide maximum
separability between both classes (left and right). In this re-
spect the appearance of each spectrum was not of major im-
portance. The reactive bands in the spectra are similar among
most of the signals within each class but there are usually dis-
crepancies in the upper and lower frequencies of each band
as well as in the peak amplitude of each band. To reduce the
possibility of these frequency components having a negative
eect on the identication of features within each class an
interpolation process is performed to extract the gross shape
of the spectrum.
The interpolation process can smooth the spectra and
thus the dierences between spectra within each class can be
minimised. In this way some of the larger peaks may be lost
but the interpolation plays a role in compensating missing
information which ought to contribute to the discrimination
[17] and can help reduce the intra class variancea funda-
mental goal of most FEPs. The formula for the interpolation
process is shown as follows:
Yip
k
l
(u) =
IPE
i=IPS
Y
k
(i, l)
IPEIPS +1
, l = 1, . . . , E, (3)
IPS =
0 if u ip < 0,
u ip otherwise,
(4)
IPE =
N
f
1 if u + ip > N
f
+ 1,
u + ip otherwise.
(5)
Equation (2) is used to calculate Y
k
, E is the number of
spectra, and u is the value of the interpolated spectra at each
frequency point (harmonic), therefore u = 0, 1, . . . , N
f
1.
N
f
is the number of frequency points or Fourier transforms
in the spectra. The value of ip determines the number of in-
terpolation points which in turn determines the degree of
smoothing. A feature, f
k
l
, is obtained by taking the l
2
-norm
(i.e., the square root of the sum of the components squared)
of the interpolated spectra between the preselected reactive
frequency bands. If E is the number of spectra (i.e., the num-
ber of windows for one spectrogram) and L is the number of
signals, then
m = LE, (6)
f
k
l
=
Yip
k
l
, l = 1, . . . , E, (7)
where f
k
l
is a feature obtained from the reactive frequency
bands of lth interpolated spectrum of the signal recorded at
the kth recording site. According to (6), if there are 3 spec-
tra (i.e., 3 STFT windows) within the FE-window for each
signal, then E = 3, L = 2 (2 signals), and m = 6 thus,
each feature vector would contain six features. To recapit-
ulate, the number of features depends on the number of
STFT-windows which depends on the FE-windowlength, the
STFT-window length, and the amount of overlap between
each STFT-window. If a large number of spectra are pro-
duced, choosing a number of specic interpolated spectra for
feature extraction will reduce the feature vector dimension-
ality and thus maintain/improve computational eciency;
however this may cause performance degradation. The fea-
ture vector is
f v =
f
1
1
, f
1
2
, . . . , f
1
E
, f
2
1
, f
2
2
, . . . , f
2
E
. (8)
6. CLASSIFICATION
After feature extraction classication is performed using lin-
ear discriminant analysis (LDA), a classier that works on
the assumption that dierent classes of features can be sep-
arated linearly. Linear classiers are generally more robust
than their nonlinear counterparts, since they have only lim-
ited exibility (less free parameters to tune) and are less
prone to overtting [18]. Experimentation involved extrac-
tion and classication of features at every time point in a
trial. The classes were labelled 1 for left and +1 for right.
This resulted in a classier which provides a time-varying
signed distance (TSD) as described in [6, 11]. The sign of the
classication indicates the class and the magnitude (or TSD)
indicates the condence in the classication. The time evo-
lution of the CA rates and the TSD can be used to determine
when the signals are most separable. The TSD is described in
the following section.
7. PERFORMANCE QUANTIFICATION
The performance of the proposed BCI system is quantied
by CA, IT rate, and the MI. The CA is the percentage of tri-
als that are classied correctly. The capacity of a communi-
cation system is given by its IT rate, normally measured in
bits/min (bpm). Capacity is often measured by the accuracy
and the speed of the system in a specied application [19].
For systems that rely on accuracy and speed, the main ob-
jective is to maximise the number of bits that can be com-
municated with high accuracy in a specic time window. In
present BCI systems, increasing the speed and accuracy is
one of the main objectives. For example, the BCI systems in
[4, 8, 19, 20, 21, 22] must be able to accurately decipher the
EEG signals and respond correctly to its interpretation of the
users command as quickly as possible. IT rate was rst used
to quantify the performance of a BCI system by Wolpaw et
al. [19] and the calculation was derived in [23, 24]. A rela-
tively new quantier of performance for a BCI system is to
quantify the MI which is a measure of the average amount
of information a classier output contains about the signal.
This performance measure was rst used by Schlogl et al.
[6, 7]. To estimate the MI the classier should produce a dis-
tance value, D, where the sign of D indicates the class (in
a two-class system) and the value expresses the distance to
the separating hyperplane. A greater distance from the hy-
perplane indicates a higher signal-to-noise ratio (SNR). D is
referred to as the time-varying signed distance (TSD) when
estimated at the rate of the sampling interval. The D value
at a specic time point t (i.e., D(t)) for all trials is used to
estimate the MI. The MI between the TSD and the class rela-
tionship is the entropy dierence of the TSD with and with-
out the class information. The system described in this work
facilitates features to be extracted with a time resolution as
high as the sampling rate very easily, therefore the TSD is
estimated at every time instant t although there must be M
samples within the FE window before feature extraction be-
gins.
8. SYSTEMOPTIMISATION
Due to the nature of this FEP, there are a number of param-
eters that must be tuned and the values of these parameters
can have a signicant eect on the performance of the sys-
tem. These parameters are listed as follows:
(i) width of subject-specic frequency band(s),
(ii) FE window length, M,
(iii) STFT window length, N,
(iv) window width, ,
(v) overlap between STFT windows, ovl,
(vi) interpolation interval, ip.
Firstly, the most reactive frequency bands are selected.
It is known from Pfurtschellers work [16] and the Graz re-
search groups [10] theoretical and meticulous work on EEG
signals recorded during the imagination of left- and right-
hand movement, as well as analysis done on the spectral
graphs showing the ERD/ERS phenomenon for subject S1
(c.f. Section 4), that the most reactive bands usually occur in
the (812 Hz) and central (1825 Hz) range. Further ad-
justments of the selected bands were carried out during the
performance evaluation and it was observed that CA could
be increased by adjusting the range of the selected bands. In
this investigation an empirical selection of the most reactive
frequency bands was performed by increasing or decreasing
the and central beta bands in steps of 0.25 Hz. The data set
for each subject was partitioned into three subsetsa train-
ing set (Tr), a validation set (V), and a testing set (Ts). The
training sets consisted of 100 trials for subject S1, and 120 for
subjects S2 and S3. The validation set for each subject con-
sisted of 40 trials. The test (Ts) set consisted of 100 trials for
subject S1, and 120 for subjects S2 and S3. The best subject-
specic frequency bands and all other parameters were cho-
sen by testing the system on a validation data set and choos-
ing the band widths that provided the highest CA rates.
To begin the parameter selection procedure rstly, the FE
window length, M, was chosen. The value of M had to be
large enough so that the window contained enough signal to
extract reliable features; however a window that is too large
may result in degraded performance. For example, if a win-
dow length M = 500 is chosen, the minimum classication
time is 500 s
128
1
s = 3.9 s and if M = 300 the minimum
classication time is 2.34 s therefore, the IT rate can be signif-
icantly inuenced by the choice of M. Six dierent window
sizes ranging between 100 and 450 were tested. The window
size which provided the best features was selected for further
tests. To tune the remaining STFT parameters rstly, 3 val-
ues of were chosen and subsequently tests were run with
N = 50 : 50 : 300 (i.e., N was set for all multiples of 50 up
to 300) whilst ip and ovl were set to 1. It was assumed that
by observing results at 6 dierent STFT window lengths, for
each of the three dierent values of , a sucient indication
of good combinations of these parameters for each subject
could be attained. The highest CA rates on the training data
were used to indicate the best combinations of all param-
eters. Up to eight dierent values of ovl were then selected
ranging from 1 to 100 in specic multiples of 5 for small N
and 10 for larger values of N. The value of ovl must be less
than N. At each value of ovl and the chosen best values of N
and , obtained from the rst selection procedure, another
set of tests were run with ip = 3 : 3 : 18. Again CA rates
were used to choose the best combination of all four param-
eters. It was observed that the CA rates are sensitive to small
changes in ip so another set of tests were carried out where
the best chosen ip values from the previously described tests
were decremented and incremented by 1 and then 2. In cer-
tain cases additional variations of the parameters were intro-
duced for exhaustive tests. In a minority of situations the CA
rates for two or more combinations were equal and in this
case the IT rate was used to decide the best choice. This pa-
rameter selection technique only covers a small percentage of
the possible combinations, therefore a more meticulous anal-
ysis may produce better results. An automated method could
be used to search the parameter space for optimisation of the
system.
9. RESULTS
All parameter selection was done by analysing how well the
system performed on the validation data (40 trials for subject
S1 and 60 trials for subjects S2 and S3). To test the generali-
sation abilities of the system, further tests were performed on
the unseen testing data which consisted of 100 trials for each
of the subjects. All performance quantiers are estimated at
the rate of the sampling interval (i.e., the performance is av-
eraged over all trials at each time point; therefore, after each
new sample is enveloped in the main FE window, the old-
est sample is removed and a new set of features is extracted
and classied). The results at the best time points (deter-
mined by the point at which CA is maximal) are presented.
Table 1 shows the results obtained based on the parameters
selected using the approach described in the previous sec-
tion. Columns 1 and 2 indicate the subject and the selected
subject specic frequency bands (2 frequency bands for each
subject), respectively. There are three parameter combina-
tions (PCs), and the corresponding results, shown for each
subject. Column 3 species the PCfor each subject for ease of
reference. Columns 48 specify the FE window length M, the
STFTwindowlength N, the windowwidth, , the overlap be-
tween STFT windows, ovl, and the interpolation interval, ip,
respectively. Column 8 species the number of features, m,
which is calculated using (6). The CA rates for the validation
data are specied in column 10. The CA rates, times at which
CA is maximal (CT), the corresponding IT rates, and the
maximum MI for the test data are specied in columns 11
14, respectively. All simulations were performed using MAT-
LAB (http://www.mathworks.com). Functions from various
toolboxes were utilised and all data manipulation and itera-
tive software routines were developed using MATLAB source
code.
9.1. Subject S1
From Table 1 it can be seen that the most reactive frequency
bands and feature extraction parameters dier among sub-
jects. For subject S1 the most reactive bands are within the
entire range and a small band (1819.5) within the central
range. When selecting the FE window size for subject S1,
the CA rates for two dierent windows were equal; there-
fore, the STFT window parameters were selected for each
of these windows and the results were compared. The best
STFT window parameters diered for both FE windows. The
CA rates on the test data were less than those achieved on
the validation data indicating that overtting occurred. PC2
achieved a higher CA rate on the validation data and also
generalised the best to the test data. Also, the highest IT rates
are not correlated with highest CA rates although the MI for
PC2 is highest. As can be seen, the test CA rates for PC2 are
only 1% higher than those obtained using PC1 but the IT
rates are circa 3 bits/min lowera substantial dierence in
IT rate. This is due to CT being much lower for PC1. The
classication time is considered as the time interval (CT),
beginning at the moment the user initiates the communica-
tion signal (i.e., second 3 of timing scheme [8]) and ending
at the point where classication is performed. In an oine
analysis, IT rate is calculated at the point where CA is maxi-
mal, thus providing an estimate of the maximum IT that the
system is capable of achieving. The FE window size is signif-
icantly smaller for PC1 than for PC2 and, as mentioned in
Section 4, this can aect the IT rate (i.e., the minimum CT
is always M
128
1
). This is possibly the reason for signif-
icant dierences in IT rates and indicates the importance of
selecting the best FE window size.
9.2. Subject S2
The most reactive frequency bands for subject S2 were se-
lected to be at the upper half of the band (10.7513), the
upper end of lower band, and central bands. In this case
the CA rates of the test data are signicantly higher than that
of the validation data; however, the PC for this subject was
chosen as the best and the results indicate that this PC gen-
eralises well to the test data. The dierence in the CA rates
may be due to the fact that the validation set is much smaller
than the test set and may contain a larger percentage of trials
which are more dicult to classify. The IT rate is signicant
at almost 9 bits/min. The MI for this subject is high, indicat-
ing that the SNR is high and that this subject may be able to
performmodulated control of cursor more comfortably than
subject S1.
9.3. Subject S3
The most reactive bands for subject S3 appeared to be be-
tween the upper end of the band and the lower end of the
central as well as in the upper band. The upper band is a
fairly uncommon reactive band but the selection method de-
scribed in Section 8 resulted in this band being chosen. For
this subject the CA rates are, again, higher for the test data
than for the validation data. This is possibly for the same rea-
sons described for subject S2. The IT rate is signicant at al-
most 12 bits/min. It can be seen that the CT is approximately
0.5 s less than that of subject S1 (PC2) but there is large dif-
ference in IT rates. This is due to the CT and the CA rates
for each subject being substantially dierent. The MI for this
subject is similar to that of subject S2.
Table 1: FEP parameter combination for three subjects and a comparative analysis of results.
Parameters No. Val Test
Sub
2 freq. bands
(Hz)
PC M N
ovl
ip m
CA
(%)
CA
(%)
CT
(s)
IT
(b/m)
MI
(bits)
S1
813, 1 200 50 0.68 1 4 8 90 85 2.18 10.7 .047
1819.5 2 360 100 3.68 15 2 8 91.3 86 3.42 7.28 0.52
S2
10.7513,
1722.5
1 360 100 0.68 1 4 6 86.3 91.7 4.11 8.56 0.65
S3
11.519.5,
27.2530
1 360 50 0.68 5 4 14 87.5 91 2.98 11.33 0.63
10. DISCUSSION
10.1. Systemcomparison
Results from this work show that the proposed FEP com-
pares well to existing approaches. Performance results vary
depending on dierent parameters choices. CA rates of 92%
are achieved on unseen data without using cross-validation.
Results ranging from 70% to 95% are reported for experi-
ments carried out on similar EEG recordings [8, 9, 10]. Many
of these results are subject specic and in some cases are
based on a 10
10 cross-validation, results of which provide

a more general view of the classication ability [8]. In [10] it
is shown that the features derived from the power of the fre-
quencies are most reliable for online feature extraction where
results are obtained from 4 subjects, over a number of ses-
sions. In the rst few sessions the CA rates range between
73% and 85% and for later sessions the results range from
83% to 90%. The results in this work are based on record-
ings made in the rst few sessions at early stages of training
and results range between 85%92%. Results are reported on
tests across dierent sessions, indicating that the approach is
fairly stable and robust for all subjects. Robustness appears
to be an advantage of this approach, however an analysis
for multiple subjects over multiple sessions is necessary to
clarify this. Current BCIs have maximum IT rates of up to
25 bits/min [25]. In [26] it is shown that IT rates ranging
between 12 and 18 bpm are achieved using left/right motor
imagery data although, some of these results are based on a
10 10 fold cross-validation. In this investigation IT rates
between 812 bits/min are achieved.
10.2. FEP parameters
Due to the considerably large number of possible FEP pa-
rameters combinations, all possible combinations were not
tested. A more ecient way to nd the optimum param-
eter settings would be to develop a tness function which
contains details on the three performance measures and the
CT and use an automated search algorithm to optimise the
PC. Criteria for limiting the optimisation to prevent over t-
ting may also be necessary. This would require a substan-
tial amount of development and simulation time but would
probably result in improved performance. For this analysis
the results obtained were sucient and compare well to re-
sults reported in BCI literature utilising similar data.
The selection of subject-specic frequency bands did sig-
nicantly inuence the results. The most reactive frequency
bands were initially selected based on the visual inspection
and then adjusted to obtain optimal performance (c.f. Sec-
tion 8). In [10, 27, 28] the most reactive subject-specic
frequency bands were selected by a technique known as
distinction sensitive learning vector quantisation (DSLVQ)
and it is shown that optimal electrode positions and fre-
quency bands are strongly dependent on the subject and that
subject-specic frequency component selection is very im-
portant in BCI systems. In [28] DSLVQ is applied on spec-
tral data in 1 s time window starting after cue presentation
whereas in this work the most reactive frequency bands were
selected by analysing the time course of the CA rate. It is
known that the frequency components may evolve during
the course of the motor imagery tasks so it is possible that
the most relevant bands vary during this period also. The
empirical approach to frequency band selection employed in
this work was used to nd a general set of frequency bands
for each subject so that CA could be maximised during the
course of performing the mental task. Also, the bands were
adjusted in steps of 0.25 Hz whereas in [28] the analysis was
performed on 1 Hz bands ranging between 9 and 28 Hz. The
approach carried out in this work was not overly time con-
suming and converged to a good set of relevant frequency
bands for each subject. Although the approach described
in this work is a manual approach, it may account for the
evolving relevance of the frequency bands more so than the
DSLVQ approach which is more automated but may have
been more time consuming to perform an analysis such as
that described in this work. In [28] it is suggested that, due
to the relevance of frequency bands changing over the course
of the trial, the DSLVQ algorithm may need dynamic adap-
tation to maintain optimal band selection. Future work will
involve experimentation with DSLVQ to determine its po-
tential for dynamically selecting the relevant frequency bands
from EEG signals as they evolve during the course of the mo-
tor imagery tasks. This may enhance the accuracy and auton-
omy of the feature extraction procedure.
The FE windowlength can signicantly inuence the time
course of the CA rates and CT. The best FE-window for all
subjects appeared to be between 200360 (i.e., between circa
1.56 s and 2.73 s long). None of the CTs equalled the win-
dow length, M, indicating that there was some data removed
(i.e., forgotten) from the FE window before data within the
window became most separable. Therefore proper selection
of the FE window can substantially improve performance
by capturing only signal sequences which are most separa-
ble and forgetting data that may contribute to performance
degradation.
The STFT window parameters (N, , and ovl) are also
crucially important for this approach. Most CA rates were
maximised by using short but wide (small ) windows with
small amounts of overlap. As detailed in Section 3, if the win-
dow is too narrow, the frequency resolution will be poor, and
if the window is too wide, the time localisation will not be so
precise. The temporal resolution can be made as high as pos-
sible by sliding the STFT windowalong the FE windowwith a
large overlap. A small and wide STFT window (M = 50) can
localise the frequency components in time whilst, at the same
time, obtain a good frequency resolution. The window func-
tion utilised in this work becomes more like a uniform win-
dowwith a parabolic top (i.e., less Gaussian) as is decreased
below 2. Therefore, most of the best PCs chosen cause the
frequency components within each STFT window to be em-
phasised more so than a Gaussian window ( > 2) would al-
low. The temporal resolution is achieved by sliding the STFT
along the data with a certain overlap. Results from additional
tests suggest that if the temporal resolution is too high (i.e.,
a large overlap) features overtting may occur. N was set to
100 in the best PC for subject S2, indicating that the time
localisation did not have to be as precise.
The interpolation process also plays an important role in
the improvement of CA. The degree of smoothing is propor-
tional to the value of ip. If ip is zero then no interpolation
is performed. As can be seen from Table 1, for the best PCs
for all subjects, some degree of smoothing was found to im-
prove the CA rate. The improvement was, in some cases, only
slight (approximately 2%) but nevertheless this is signicant.
The feature separability is very sensitive to the value of ip and
increasing ip too much can cause performance degradation.
As outlined in [19], a small increase in CA can signicantly
improve the IT rate, therefore the performance enhancement
that the interpolation process can provide is very important
in BCI systems. As mentioned, most of the PCs provided
good time-frequency resolution but if the frequency resolu-
tion is too precise the intraclass variation will increase due
to irregular frequency components. The interpolation pro-
cess reduces the negative eects of irregular frequencies by
smoothing the spectra and thus reducing the intraclass vari-
ance. Even increasing ip to 2 can reduce the intraclass vari-
ance and produce better CA and MI rates; however, in some
cases, the interpolation process can reduce the interclass vari-
ance.
Overall, the parameters for each subject (apart from the
subject-specic frequency bands and FE-window size) show
some coherence. Therefore it may be appropriate to select
a standard set for all subjects. This would allow fast appli-
cation of the system to each individual subject. It is also
possible that, by optimising the parameter combinations for
each subject using an automated search algorithm, improved
performance could be achieved, although the training times
may be costly. Parameters M and N do not have to be very
nely tuned to obtain the best performance. Parameters ,
ip, and ovl are critical parameters and cannot be varied too
much from the selected best without signicant degradation
in performance. In additional experimentation, parameters
were chosen arbitrarily with a small STFT window (N = 50)
and high CA rates were achieved on the validation data but
the results on the testing data were unsatisfactory. This oc-
curred when ovl was large. For example, when the overlap
was set equal to 45 (i.e., 95%), a large number of spectra
were produced for each signal. Assuming the FE window size
M = 360 then, the number of spectra (i.e., STFT windows)
is E = (M ovl)/(N ovl) = 63 and from (6) m = 126 (cf.
Section 3). This large number of features is almost half the
number of data samples in the window and this can result in
overtted features. Thus the linear classier begins to over-
t. Parameter combinations that produced lower numbers of
features (i.e., < 30) produced classiers which generalised the
best to the unseen test data.
10.3. The performance quantifying methods
The three performance measures have advantages and dis-
advantages and based on each, dierent conclusions can be
drawn about the system. All three provide dierent informa-
tion; classication accuracy rate simply provides the accuracy
and other information such as sensitivity and specicity can
be obtained. Even though these measures provide informa-
tion about how well the system can distinguish between dif-
ferent sets of features extracted from the input space, they do
not provide any information about the time required to do
so. Timing is critical in any communication system and in
most cases communicating in real time or as close as possi-
ble to real time is desirable. So, if a two-class system achieves
100%accuracy but it requires 20 seconds to performthe clas-
sication, then the advantage gained by the high accuracy
is diminished by the fact that the classication required so
much time.
As can be seen from Table 1, dierences in CA and CT
have signicant eects on the IT rate, a performance mea-
sure which can quantify the performance of the system based
on the CT and CA. The challenge is to nd the optimal per-
formance between accuracy and speed. In some cases the op-
timum can be obtained by accepting an FEP or classier that
has a reduced accuracy but a fairly rapid response. This will
produce signicantly faster IT rates but will result in a sys-
tem where the probability of misclassications occurring is
much higher. This can be observed for the results of subject
S1 where there is a slight dierence in CA (1%) but a large
dierence in IT. The PC with the highest CA did not obtain
the highest IT, therefore care must be taken when choosing
the best PC.
MI calculation does not consider the accuracy or the time
of classication but does quantify the average amount of in-
formation that can be obtained from the classier output
about the signal. This may be very important if it is intended
to use the classier output to control an application which re-
quires proportional control. For example, the control of cur-
sor may be performed by adjusting the cursor proportional
to the magnitude (TSD) of the classier output and/or using
the cursor to select from more than two choices on a one-
dimensional scale. A persons ability to vary the MI would
provide potential for the system to increase the possible IT
rate to more than one bit for a two-class problem [6]. The
MI can quantify how well a system may perform these types
of tasks but does not provide much information about ac-
curacy and time, therefore would not be a better quantier
than IT rate, although MI does provide information about
the system that the IT rate does not. Overall maximising the
CA rates is the most important although there is more useful
information about the system performance contained in the
IT rate.
11. CONCLUSION
To the best of the authors knowledge, this type of TF-based
FEP has not been used for feature extraction in EEG-based
communication before. Although TF-based FEPs have been
reported for application in BCIs, a process which involves a
main FE window and interpolation process is a novel proce-
dure and, as the results demonstrate, signicantly enhances
the FEP and overall system performance. Analysing the time
evolution of the frequencies and values of the performance
quantiers can determine the best FE window size and also
provides information about the signal segments which are
most separable. The FE-window-based approach can be used
for continuous feature extraction and thus has the potential
to be used in an online system.
As the calculation of IT rate utilises knowledge on CA
and duration of classication, IT rate provides signicantly
more knowledge about the system than simply the CA rate
and the MI. However, classication accuracy is the most im-
portant in BCI applications and IT rates could be deceiving if
CA and CT are not reported also. Therefore, it is concluded
that, although IT rate is the best performance quantier, all
three quantiers can provide information on dierent and
important aspects of a BCI system. It is suggested that the re-
sults of each performance quantier should be analysed and
reported.
Further work will involve developing automated pro-
cedures for selecting the most reactive subject-specic fre-
quency bands and an automated parameter optimisation
procedure which can search the parameter space to nd the
optimum subject-specic parameters. Although, an empiri-
cal selection procedure can be used to select good subject-
specic parameter combinations, it is anticipated that the full
potential of the proposed approach will be realised only by
developing a more intuitive parameter selection procedure.
ACKNOWLEDGMENTS
The authors would like to acknowledge the Institute of
Human-Computer Interfaces, University of Technology, and
Guger Technologies (G.Tec), Graz, Austria, for providing the
EEG. The rst author of this work is funded by a William
Flynn scholarship.
REFERENCES
munication and control, Journal of Clinical Neurophysiology,
vol. 113, no. 6, pp. 767791, 2002, invited review.
[2] A. E. H. Emery, Population frequencies of inherited neu-
romuscular diseasesa world survey, Neuromuscular Disor-
ders, vol. 1, no. 1, pp. 1929, 1991.
[3] J. R. Wolpaw, N. Birbaumer, W. J. Heetderks, et al., Brain-
computer interface technology: a review of the rst interna-
tional meeting, IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 164
173, 2000.
[4] D. Coyle, G. Prasad, and T. M. McGinnity, EEG-based com-
munication: a time-series prediction approach, in Proc. IEEE
Cybernetics IntelligenceChallenges and Advances (CICA 03),
pp. 142147, Reading, UK, September 2003.
[5] B. J. Fisch, Fisch and Spehlmanns EEG Primer: Basic Princi-
ples of Digital and Analog EEG, Elsevier, New York, NY, USA,
1999.
[6] A. Schlogl, C. Neuper, and G. Pfurtscheller, Estimating the
mutual information of an EEG-based Brain-Computer Inter-
face, Biomedizinische Technik, vol. 47, no. 1-2, pp. 38, 2002.
[7] A. Schlogl, C. Keinrath, R. Scherer, and G. Pfurtscheller, In-
formation transfer of an EEG-based brain computer inter-
face, in Proc. 1st International IEEE EMBS Conference on Neu-
ral Engineering, pp. 641644, Capri Island, Italy, March 2003.
[8] C. Guger, A. Schlogl, C. Neuper, D. Walterspacher, T. Strein,
and G. Pfurtscheller, Rapid prototyping of an EEG-based
brain-computer interface (BCI), IEEE Transactions on Neu-
ral Systems and Rehabilitation Engineering, vol. 9, no. 1, pp.
4958, 2001.
[9] E. Haselsteiner and G. Pfurtscheller, Using time-dependent
Neural Networks for EEG classication, IEEE Trans. Rehab.
Eng., vol. 8, no. 4, pp. 457463, 2000.
[10] G. Pfurtscheller, C. Neuper, A. Schlogl, and K. Lugger, Sep-
arability of EEG signals recorded during right and left mo-
tor imagery using adaptive autoregressive parameters, IEEE
[11] G. Pfurtscheller, C. Guger, G. M uller, G. Krausz, and C. Neu-
per, Brain oscillations control hand orthosis in a tetraplegic,
Neuroscience Letters, vol. 292, no. 3, pp. 211214, 2000.
[12] A. Schlogl, D. Flotzinger, and G. Pfurtscheller, Adaptive au-
toregressive modeling used for single-trial EEGclassication,
Biomedizinische Technik, vol. 42, no. 6, pp. 162167, 1997.
[13] M. Roessgen, M. Deriche, and B. Boashash, A compara-
tive study of spectral estimation techniques for noisy non-
stationary signals with application to EEG data, in Proc. Con-
ference Record of The 27th Asilomar Conference on Signals, Sys-
tems, and Computers, vol. 2, pp. 11571161, Pacic Grove,
Calif, USA, November 1993.
[14] S. V. Notley and S. J. Elliott, Ecient estimation of a time-
varying dimension parameter and its application to EEGanal-
ysis, IEEE Trans. Biomed. Eng., vol. 50, no. 5, pp. 594602,
2003.
[15] R. Q. Rodrigo, Quantitative analysis of EEG signals: time-
frequency methods and chaos theory, Ph.D. thesis, Medical Uni-
versity of Lubeck, Lubeck, Germany, 1998.
[16] G. Pfurtscheller, Electroencephalography, Basic Principles,
Clinical Application and Related Fields, E. Niedermeyer and F.
L. Da Silva, Eds., Williams and Wilkins, Baltimore, Md, USA,
4th edition, 1998.
[17] D. Nishikawa, W. Yu, H. Yokoi, and Y. Kakazu, On-line learn-
ing method for EMG prosthetic hand control, Electronics and
Communications in Japan (Part III: Fundamental Electronic
Science), vol. 84, no. 10, pp. 3546, 2001, scripta technica.
[18] K.-R. Muller, C. W. Anderson, and G. E. Birch, Linear
and nonlinear methods for brain-computer interfaces, IEEE
Transactions on Neural Systems and Rehabilitation Engineer-
ing, vol. 11, no. 2, pp. 165169, 2003.
[19] J. R. Wolpaw, H. Ramoser, D. J. McFarland, and G.
Pfurtscheller, EEG-based communication: improved accu-
racy by response verication, IEEE Trans. Rehab. Eng., vol. 6,
no. 3, pp. 326333, 1998.
[20] E. Donchin, K. M. Spencer, and R. Wijesinghe, The men-
tal prosthesis: assessing the speed of a P300-based brain-
computer interface, IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp.
174179, 2000.
[21] K. A. Moxon, Brain-control interfaces for sensory and motor
prosthetic devices, in Proc. IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP 01), vol. 6,
pp. 34453448, Salt Lake City, Utah, USA, May 2001.
[22] A. Kostov and M. Polak, Parallel man-machine training in
development of EEG-based cursor control, IEEE Trans. Re-
hab. Eng., vol. 8, no. 2, pp. 203205, 2000.
[23] C. E. Shannon and W. Weaver, The Mathematical Theory of
Communication, University of Illinois Press, Urbana, Ill, USA,
1963.
[24] J. R. Pierce, An Introduction to Information Theory, Dover,
[25] T. M. Vaughan, Guest editorial brain-computer interface
technology: a review of the second international meeting,
IEEE Transactions on Neural Systems and Rehabilitation En-
gineering, vol. 11, no. 2, pp. 94109, 2003.
[26] P. Sykacek, S. Roberts, M. Stokes, E. Curran, M. Gibbs, and
L. Pickup, Probabilistic methods in BCI research, IEEE
Transactions on Neural Systems and Rehabilitation Engineer-
ing, vol. 11, no. 2, pp. 192194, 2003.
[27] M. Pregenzer and G. Pfurtscheller, Distinction sensitive
learning vector quantization (DSLVQ) application as a clas-
sier based feature selection method for a Brain Computer
Interface, in Proc. 4th International Conference on Articial
Neural Networks (ICANN 95), no. 409, pp. 433436, Cam-
bridge, UK, June 1995.
[28] M. Pregenzer and G. Pfurtscheller, Frequency component se-
lection for an EEG-based brain to computer interface, IEEE
Damien Coyle was born in 1980 in the Re-
public of Ireland. He graduated from the
University of Ulster in 2002 with a First-
Class Honours degree in electronics and
computing engineering. He is currently un-
dertaking research as a Ph.D. student in
the Intelligent Systems Engineering Labo-
ratory (ISEL) at the University of Ulster.
His research interests include nonlinear sig-
nal processing, biomedical signal process-
ing, chaos theory, information theory, and neural and adaptive sys-
tems. Coyle is a Member of the IEE and IEEE.
Girijesh Prasad was born in 1964 in In-
dia. He has a First-Class Honours degree
in electrical engineering and a First-Class
Masters degree in computer science and
technology and received a Doctorate from
the Queens University of Belfast in 1997.
Currently he holds the post of Lecturer
and is a member of Intelligent Systems
Engineering Laboratory (ISEL) research
group of the School of Computing and Intelligent Systems at the
University of Ulster. His research interests include computational
intelligence, predictive modelling and control of complex nonlin-
ear systems, performance monitoring and optimisation, thermal
power plants, brain-computer interface, and medical packaging
processes. He is a Member of the IEE and the IEEE and is a Char-
tered Engineer.
T. M. McGinnity has been a member of
the University of Ulster academic sta since
1992, and holds the post of Professor of
intelligent systems engineering within the
Faculty of Engineering. He has a First-Class
Honours degree in physics, and a Doctor-
ate from the University of Durham, is a Fel-
low of the IEE, Member of the IEEE, and a
Chartered Engineer. He has 25 years of ex-
perience in teaching and research in elec-
tronic engineering, leads the research activities of the Intelligent
Systems Engineering Laboratory at the Magee campus of the uni-
versity, and is Head of the School of Computing and Intelligent Sys-
tems. His current research interests relate to the creation of intel-
ligent computational systems in general, particularly in relation to
hardware and software implementations of neural networks, fuzzy
systems, genetic algorithms, embedded intelligent systems utilizing
re-congurable logic devices, and bio-inspired intelligent systems.
EEG-Based Asynchronous BCI Controls Functional
Electrical Stimulation in a Tetraplegic Patient
Gert Pfurtscheller
Laboratory of Brain-Computer Interfaces, Institute of Computer Graphics and Vision, and Ludwig Boltzmann-Institute for
Medical Informatics and Neuroinformatics, Graz University of Technology, Ineldgasse 16a, 8010 Graz, Austria
Email: pfurtscheller@tugraz.at
Gernot R. M uller-Putz
Laboratory of Brain-Computer Interfaces, Institute of Computer Graphics and Vision, Graz University of Technology,
Ineldgasse 16a, 8010 Graz, Austria
Email: gernot.mueller@tugraz.at
J org Pfurtscheller
Department of Traumatology, Hospital Villach, Nikolaigasse 43, 9400 Villach, Austria
Email: j.pfurtscheller@tugraz.at
R udiger Rupp
Department II, Orthopedic Hospital of Heidelberg University, Schlierbacher Landstrae 200a, 69118 Heidelberg, Germany
Email: ruediger.rupp@ok.uni-heidelberg.de
Received 29 January 2004
The present study reports on the use of an EEG-based asynchronous (uncued, user-driven) brain-computer interface (BCI) for
the control of functional electrical stimulation (FES). By the application of FES, noninvasive restoration of hand grasp function
in a tetraplegic patient was achieved. The patient was able to induce bursts of beta oscillations by imagination of foot movement.
These beta oscillations were recorded in a one EEG-channel conguration, bandpass ltered and squared. When this beta activity
exceeded a predened threshold, a trigger for the FES was generated. Whenever the trigger was detected, a subsequent switching
of a grasp sequence composed of 4 phases occurred. The patient was able to grasp a glass with the paralyzed hand completely on
his own without additional help or other technical aids.
Keywords and phrases: beta oscillations, motor imagery, functional electrical stimulation, brain-computer interface, spinal cord
injury, neuroprosthesis.
1. INTRODUCTION
The idea of direct brain control of functional electrical stim-
ulation (FES) seems to be a realistic concept for restoration of
the hand grasp function in patients with a high spinal cord
injury. Today, electrical brain activity either recorded from
the intact scalp (EEG) or with subdural electrodes (ECoG)
can be classied and transferred into signals for control of
FES system (neuroprosthesis). Nowadays both implantable
systems [1, 2] and devices using surface electrodes [3] are
available for clinical use. For the transformation of mental
commands reected in changes of the brain signal into con-
trol signals for FES devices, an asynchronous, user-driven
brain-computer interface (BCI) is necessary [4]. Such an
asynchronous BCI analyses the EEG (ECoG) continuously
and uses no cue stimuli.
For the realization of a reliable and easy to apply BCI,
only one signal channel (one recording with two electrodes)
should be used. Further, it is necessary to have a mental strat-
egy established to produce short increases or bursts in the
EEG(ECoG) amplitude and to detect the increase with a sim-
ple threshold comparator.
We report for the rst time on restoration of hand grasp
function composed of 4 phases by electrical stimulation of
hand muscles with surface electrodes and control of the stim-
ulation by one-channel EEG recording.
2. MATERIALS ANDMETHODS
2.1. Subject
The tetraplegic patient we report on is a 29-year old man suf-
fering from a traumatic spinal cord injury since April 1998.
Asynchronous BCI Controls FES in a Tetraplegic Patient 3153
He is aected by a complete motor and sensory lesion below
C5 and an incomplete lesion below C4. As a preparation for
the experiment, the patient performed an individual stimu-
lation program until he achieved a strong and fatigue resis-
tant contraction of the paralyzed muscles of the hand and
forearm. The residual volitional muscle activation of his left
upper extremity is as follows.
Shoulder: active abduction and exion up to 90
; grade
3/5 before - grade 4/5 after training, full rotational range of
motion (ROM); full passive ROM.
Elbow: active exion grade 3/5 before / grade 4/5 af-
ter training, no active extension (triceps grade 0/5); pro-
and supination possible (partly trick movement); full passive
ROM.
Forearm, hand, and ngers: M. extensor carpi radialis
(ECR) showed a palpable active contraction (grade 1/5)
without change over training; all other muscles grade 0/5; al-
most full passive ROM in nger joints; full wrist, thumb, and
forearm ROM.
2.2. Functional electrical stimulation
Our aim was to nd a functional grasp pattern that would
bring the most benet for our patient, and to nd a practical
way to generate it by use of surface stimulation electrodes. A
kind of ne manipulating grasp, providing the ability to pick
up objects from a table, for example, food or a glass, seemed
to be most suitable. This grasp is generated by exion in the
metacarpophalangeal (MCP) joints of the extended ngers
against the thumb, so that small objects are held between the
ball of the end phalanx of the ngers and the thumb, while
larger objects are held between the palmar side of the whole
ngers and the thumb.
As a precondition for a functional hand grasp pattern, the
wrist needs to be dorsal exed and held stable in this position
during exion of the ngers. Due to the lack of an adequate
active wrist extension and a partial denervation (lesion of pe-
ripheral nerve bers) of the wrist extensor muscle (M. exten-
sor carpi radialis muscle, grade 1/5) in our patient, it was not
possible to get a stable dorsal exion of the wrist by stimula-
tion, forcing us to use a mechanical orthosis xing the wrist
in a dorsal exed position.
An opening of the hand (phases 1 and 4, Figure 1) by ex-
tension of all ngers joints and the thumb could be achieved
by stimulation of the nger extensors (M. extensor digito-
rum communis) and the thumb extensor muscle (M. exten-
sor pollicis longus) with electrodes on the radial side of the
proximal forearm.
For the actual grasping (phase 2, Figure 1), we simultane-
ously stimulated the nger exors (M. exor digitorum su-
percialis, less the M. exor digitorum profundus) by one
pair of electrodes on the ulnar side of the proximal forearm
and the intrinsic hand muscles with two further electrodes
on the dorsal side of the hand. The application of the orthosis
for dorsal exion of the wrist leads to a light exed position
of the thumb sucient for serving as a stable counterpart to
the exing ngers. Therefore, no additional stabilization of
the thumb via surface stimulation was necessary.
Motor
imagery
Motor
imagery
Motor
imagery
Motor
imagery
10 s
Threshold
Trigger
for FES Relax
50 45 40 35 30 25 20 15 10 5 0
50
0
50
E
E
G
(
V
)
50 45 40 35 30 25 20 15 10 5 0
25
0
25
B
e
t
a
(
V
)
50 45 40 35 30 25 20 15 10 5 0
8
6
4
B
e
t
a
p
o
w
e
r
1 2 3 0
Figure 1: Example of bipolar EEGrecording fromthe vertex (upper
trace) bandpass ltered (1519 Hz) EEG signal (middle) and band
power time course (lower trace, arbitrary units) over a time inter-
val of 50 seconds. Threshold and trigger pulse generation after FES
operation and grasp phases are indicated. Shots of the grasping are
shown in the lower part.
For the external stimulation, we used a stimulator
(Microstim8, Krauth & Timmermann, Germany) with bi-
phasic, rectangular constant current pulses. The stimulation
frequency was set to 18 Hz; the current was set for each pair
of electrodes on an individual level. Due to the integrated
microcontroller, we were able to implement dierent stimu-
lation patterns for a grasp sequence directly into the device.
The output of the BCI was then used as a trigger signal
for switching between the dierent grasp phases (phase 0 -
no stimulation, phase 1 - opening hand, phase 2 - grasping,
phase 3 - releasing, phase 4 = phase 0, see Figure 1).
2.3. EEGrecording and processing
The EEG was recorded bipolarly from 2 gold-electrodes xed
in a distance of 5 cm in an anterior-posterior position on the
vertex (Cz according to the international 1020 system). The
EEG signal was amplied (sensitivity was 50 V) between 0.5
and 30 Hz with a bipolar EEG-amplier (Raich, Graz) and
sampled with 128 Hz. The signal was online processed by
bandpass ltering (1519 Hz), squaring, averaging over 128
samples, and logarithmizing. After passing a threshold de-
tector, a trigger pulse was generated followed by a refractory
period of 3 seconds. The threshold was empirically selected
by comparing the band power values obtained from resting
and imagery periods.
2.4. Mental strategy
Our patient participated in a number of BCI training ses-
sions with the goal of developing a mental strategy to induce
movement-specic EEG patterns and to transform these pat-
terns into a binary control signal. During the training, sev-
eral types of imaginations were used in order to increase
the classication accuracy. Imaginations of left versus right
hand movements were carried out rst. Then single-foot mo-
tor imageries versus relaxing or hand movement imagination
could increase the accuracy. Finally, after 55 training sessions,
best results were achieved by the imagination of both feet
versus right hand imagery. These two patterns were discrim-
inable online 100%.
3. RESULTS
As a rst result of the BCI training, the patient was able
to control the opening and closing of an electromechani-
cal hand orthosis by 2-channel EEG recording [5]. Inspec-
tion of EEG patterns induced by motor imagery has shown
that hand motor imagery was accompanied by a weak EEG
desynchronization [6] whereas foot motor imagery induced
large bursts of beta oscillations with frequencies of 17 Hz. It is
therefore quite logical to use only one mental state, namely,
the state inducing beta oscillations for control purposes. At
the end of the training, the patient has learned to voluntar-
ily induce beta bursts. An example of an EEG signal recorded
bipolarly on the vertex close to the foot representation area
is shown in Figure 1. The EEG signal is disturbed by large ar-
tifacts from eye movements, because the patient watched his
hand. Bandpass ltering of the EEG in the beta band (15
19 Hz) reveals 4 bursts of beta activity with a duration of
about ve seconds within the 50 seconds of recording period.
The beta power increase was used for generation of a trig-
ger pulse, whenever power exceeded the predened thresh-
old. Applying a refractory period of 3 seconds a maximum of
20 switches can theoretically be achieved per minute.
The use of only 2 electrodes placed close to the vertex
and the recording of one bipolar EEG channel minimizes the
eects brought about by using muscle activity for control.
Calculating the power spectra and computing the power in
the 2060 Hz band (part of the EMG activity band) showed
a band power close to zero.
The patient was able to trigger the FES grasp phases by
the induction of beta burst on his own. Using this setting,
our patient was able, for the rst time after the accident, to
drink from a glass without any help and without the use of a
straw.
4. DISCUSSION
It is interesting to note that the motor imagery induced beta
burst is a relative stable phenomenon in our patient with a
constant frequency around 17 Hz. Since this time about 3
years ago, foot motor imagery was always able to induce beta
bursts with constant frequency components. The generation
network of these beta bursts is very likely in the foot repre-
sentation area and/or the supplementary motor area (SMA).
In a foot motor imagery task, both primary sensorimotor
area and SMA play an important role, whereby the SMA is
located in the medial portion of Brodmans area 6 in front of
the foot representation area. From scalp recordings, we can-
not expect, however, to dierentiate between both sources,
because of the proximity of SMA and foot representation
area [7].
There is strong evidence from EEG, MEG, and ECoG
recordings that dierent motor tasks including imagery can
generate beta oscillations between 2035 Hz in the SMA
and/or the foot representation area of able-bodied subjects
[8, 9, 10, 11, 12]. Common for all these reports on induced
beta oscillations close to the vertex are its strict localization
to the midcentral area and its dominant frequency between
2035 Hz.
There is, however, one important dierence between all
the observed beta oscillations associated with a motor task
in able-bodied subjects and the induced beta oscillations in
our tetraplegic patient: The former are generated after ter-
mination of the motor task, the latter during execution of
the motor task. Whether in both cases the same or similar
networks in the SMA and/or foot representation area are in-
volved needs further research. Important is that the beta os-
cillations on the vertex induced by the reported patient are
a robust and reliable phenomenon that can be generated at
will.
For an EEG-based control of a neuroprosthesis in all-day
life under real-world conditions, the performance of the BCI
has to be maximized by using a minimum number of elec-
trodes. Using more than one single bipolar derivation, it is
likely to help identifying more than a binary switch conclud-
ing in the realization of a more complex EEG-based control
for the future.
ACKNOWLEDGEMENT
This project was supported by the Austrian Federal Min-
istry of Transport, Innovation and Technology, project
GZ140.587/2, the Lorenz-Boehler Gesellschaft, and the
Allgemeine Unfallversicherungsanstalt (AUVA).
REFERENCES
[1] B. Fromm, R. Rupp, and H. J. Gerner, The freehand system:
an implantable neuroprosthesis for functional electrostimu-
lation of the upper extremity, Handchirurgie, Mikrochirurgie,
Plastische Chirurgie, vol. 33, no. 3, pp. 149152, 2001.
[2] P. H. Peckham, M. W. Keith, K. L. Kilgore, et al., Ecacy
of an implanted neuroprosthesis for restoring hand grasp in
tetraplegia: A multicenter study, Archives of Physical Medicine
and Rehabilitation, vol. 82, no. 10, pp. 13801388, 2001.
[3] M. R. Popovic, D. B. Popovic, and T. Keller, Neuroprostheses
for grasping, Neurological Research, vol. 24, no. 5, pp. 443
452, 2002.
[4] G. Pfurtscheller and C. Neuper, Motor imagery and direct
Asynchronous BCI Controls FES in a Tetraplegic Patient 3155
brain-computer communication, Proc. IEEE, vol. 89, no. 7,
pp. 11231134, 2001.
[5] G. Pfurtscheller, C. Guger, G. R. M uller, G. Krausz, and
C. Neuper, Brain oscillations control hand orthosis in a
tetraplegic, Neuroscience Letters, vol. 292, no. 3, pp. 211214,
2000.
principles, Clinical Neurophysiology, vol. 110, no. 11,
pp. 18421857, 1999.
[7] A. Ikeda, H. O. L uders, R. C. Burgess, and H. Shibasaki,
Movement-related potentials recorded from supplementary
motor area and primary motor area. Role of supplementary
motor area in voluntary movements, Brain, vol. 115, no. 4,
pp. 10171043, 1992.
[8] C. Neuper and G. Pfurtscheller, Motor imagery and ERD, in
Event-Related Desynchronization. Handbook of Electroenceph.
and Clin. Neurophysiol. Revised Edition, G. Pfurtscheller and
F. H. Lopes da Silva, Eds., vol. 6, pp. 303325, Elsevier, Ams-
terdam, the Netherlands, 1999.
[9] G. Pfurtscheller, M. Woertz, G. Supp, and F. H. Lopes da
Silva, Early onset of post-movement beta electroencephalo-
gram synchronization in the supplementary motor area dur-
ing self-paced nger movement in man, Neuroscience Letters,
vol. 339, no. 2, pp. 111114, 2003.
[10] S. Ohara, A. Ikeda, T. Kunieda, et al., Movement-related
change of electrocorticographic activity in human supple-
mentary motor area proper, Brain, vol. 123, no. 6, pp. 1203
1215, 2000.
[11] J. Kaiser, W. Lutzenberger, H. Preissl, D. Mosshammer, and
N. Birbaumer, Statistical probability mapping reveals high-
frequency magnetoencephalographic activity in supplemen-
tary motor area during self-paced nger movements, Neuro-
science Letters, vol. 283, no. 1, pp. 8184, 2000.
[12] R. Salmelin, M. H am al ainen, M. Kajola, and R. Hari, Func-
tional segregation of movement-related rhythmic activity in
the human brain, NeuroImage, vol. 2, no. 4, pp. 237243,
1995.
Gert Pfurtscheller received the M.S. and
Ph.D. degrees in electrical engineering from
the Graz University of Technology, Graz,
Austria. He is a Professor of medical in-
formatics, Director of the Laboratory of
Brain-Computer Interfaces, Graz University
of Technology, and Director of the Lud-
wig Boltzmann-Institute for Medical Infor-
matics and Neuroinformatics. His research
interests include functional brain topogra-
phy, the design of brain-computer communication systems, and
navigation in virtual environments by a brain-computer inter-
face.
Gernot R. M uller-Putz received the M.S.
degree in biomedical engineering from the
Graz University of Technology, Graz, Aus-
tria, in May 2000. He received the Ph.D. de-
gree in electrical engineering in August 2004
from the same university. His research in-
terests include brain-computer communi-
cation systems, the human somatosensory
system, rehabilitation engineering, and as-
sistive technology.
J org Pfurtscheller received the M.D. degree
from the University of Graz in 1995. Cur-
rently he is with the Department of Trau-
matology, Hospital Villach, in the nal stage
to become a trauma surgeon. His research
interests include rehabilitation after spinal
cord injury and applications of functional
electrical stimulation.
R udiger Rupp received the M.S. degree
in electrical engineering with a focus on
biomedical engineering from the Technical
University of Karlsruhe, Germany, in 1994.
After working at the Institute for Biomedi-
cal Engineering and Biocybernetics (Profes-
sor G. Vossius), since 1996, he is currently
with the Orthopaedic University Hospital II
in Heidelberg (Professor H. J. Gerner), Ger-
many, where he holds the position as a Re-
search Group Manager. He is now nishing his Ph.D. work. His
main research interests are in the eld of rehabilitation engineer-
ing especially for spinal cord injured patients. This includes neuro-
prosthetics mainly of the upper extremity, application of functional
electrical stimulation for therapeutic purposes, development and
clinical validation of novel methods, devices for locomotion ther-
apy, gait analysis in incomplete spinal cord injury, and realization
of software projects for standardized documentation of rehabilita-
tion outcome. He is a Member of IEEE, IFESS, and VDE.
Steady-State VEP-Based Brain-Computer Interface
Control in an Immersive 3DGaming Environment
E. C. Lalor,
1
S. P. Kelly,
1,2
C. Finucane,
3
R. Burke,
4
R. Smith,
1
R. B. Reilly,
1
and G. McDarby
1
1
School of Electrical, Electronic and Mechanical Engineering, University College Dublin, Beleld, Dublin 4, Ireland
Emails: ed.lalor@ee.ucd.ie, ray.smith@ee.ucd.ie, richard.reilly@ucd.ie, gary.mcdarby@ee.ucd.ie
2
The Cognitive Neurophysiology Laboratory, Nathan S. Kline Institute for Psychiatric Research, Orangeburg NY 10962, USA
Email: skelly@nki.rfmh.org,
3
Medical Physics and Bioengineering, St. Jamess Hospital, P.O. Box 580, Dublin 8, Ireland
Email: cnucane@stjames.ie
4
EOC Operations Center, Microsoft Corporation, Sandyford Industrial Estate, Dublin 18, Ireland
Email: robert.burke@gmail.com
Received 2 February 2004; Revised 19 October 2004
This paper presents the application of an eective EEG-based brain-computer interface design for binary control in a visually
elaborate immersive 3D game. The BCI uses the steady-state visual evoked potential (SSVEP) generated in response to phase-
reversing checkerboard patterns. Two power-spectrum estimation methods were employed for feature extraction in a series of
oine classication tests. Both methods were also implemented during real-time game play. The performance of the BCI was
found to be robust to distracting visual stimulation in the game and relatively consistent across six subjects, with 41 of 48 games
successfully completed. For the best performing feature extraction method, the average real-time control accuracy across subjects
was 89%. The feasibility of obtaining reliable control in such a visually rich environment using SSVEPs is thus demonstrated and
the impact of this result is discussed.
Keywords and phrases: EEG, BCI, SSVEP, online classication, overt attention.
1. INTRODUCTION
The concept of a brain-computer interface (BCI) stems from
a need for alternative, augmentative communication, and
control options for individuals with severe disabilities (e.g.,
amyotropic lateral sclerosis), though its potential uses extend
to rehabilitation of neurological disorders, brain-state mon-
itoring, and gaming [1]. The most practical and widely ap-
plicable BCI solutions are those based on noninvasive elec-
troencephalogram (EEG) measurements recorded from the
scalp. These generally utilize either event-related potentials
(ERPs) such as P300 [2] and visual evoked potential (VEP)
measures [3], or self-regulatory activity such as slow corti-
cal potentials [4] and changes in cortical rhythms [5, 6, 7].
The former design, being reliant on natural involuntary re-
sponses, has the advantage of requiring no training, whereas
the latter design normally demonstrates eectiveness only
after periods of biofeedback training, wherein the subject
learns to regulate the relevant activity in a controlled way.
Performance of a BCI is normally assessed in terms of
information transfer rate, which incorporates both speed
and accuracy. One BCI solution that has seen considerable
success in optimizing this performance measure relies on
steady-state visual evoked potentials (SSVEPs), a periodic
response elicited by the repetitive presentation of a visual
stimulus at a rate of 68 Hz or more [8]. SSVEPs have
been successfully utilized in both above-mentioned BCI
designsgaze direction within a matrix of ickering stim-
uli is uniquely manifest in the evoked SSVEP through its
matched periodicity [3, 9], and also the self-regulation of
SSVEP amplitude has been reported as feasible with appro-
priate feedback [10].
The eectiveness of SSVEP-based BCI designs is due to
several factors. The signal itself is measurable in as large a
population as the transient VEPvery few fail to exhibit this
type of response [8, 11]. The task of feature extraction is re-
duced to simple frequency component extraction, as there
are only a certain number of separate target frequencies, usu-
ally one for each choice oered in the BCI. High signal-to-
noise ratios are obtainable when analyzing the SSVEP at suf-
ciently high frequency resolution [8]. Finally, SSVEPs are
resilient to artifacts, as blink, movement, and electrocardio-
graphic artifacts are conned mostly to lower EEG frequen-
cies [11]. Moreover, the source of ocular artifacts (blinks, eye
SSVEP-Based BCI Control in 3D Gaming Environment 3157
movements) is located on the opposite side of the head to
the visual cortex over which the SSVEP is measured. Though
these characteristics are well armed by the success of cur-
rent SSVEP-based BCIs [3, 9], it is not known to what degree
performance may be compromised by concurrent unrelated
visual stimulation, where an individuals visual resources are
divided, as in a video gaming environment.
In this paper, the authors wish to address a novel applica-
tion of the SSVEP-based BCI design within a real-time gam-
ing framework. The video game involves the movement of
an animated character within a virtual environment. Both
the character and environment have been modelled as 3D
volumes. The lighting and virtual camera position change
in response to the characters movements within the envi-
ronment. Overall, the result is a very visually engaging video
game.
The SSVEP response constitutes only a portion of the
overall set of visual processes manifest in the ongoing EEG
during game play. In this study, we address the challenge of
extracting and processing SSVEP measures from a signal of
such indeterminate complexity in real time for BCI control.
The design of the SSVEP-based BCI was split into two
parts. First, a preliminary oine analysis was conducted to
determine the most favourable signal processing methodol-
ogy and choose suitable frequencies. Once satisfactory oine
analysis results were obtained, the full real-time game was
implemented. Performance of the real-time BCI game when
played by six normal subjects is presented.
2. PRELIMINARY ANALYSIS
2.1. Methods
(A) Subjects
Five male subjects, aged between 23 and 27, participated in
the preliminary study. All subjects had normal or corrected-
to-normal vision.
(B) Experimental setup
Subjects were seated 70 cm from a 43 cm (17) computer
monitor. EEG was acquired in a shielded room from two Ag-
AgCl scalp electrodes placed at sites O1 and O2, according to
the 1020 international electrode-positioning standard [12],
situated over the left and right hemispheres of the primary vi-
sual cortex, respectively. Skin-electrode junction impedances
were maintained below 5 k. Each channel, referenced to
the right ear lobe on bipolar leads, was amplied (20 K),
50 Hz line ltered, and bandpass ltered over the range 0.01
100 Hz by Grass Telefactor P511 rack ampliers. Assuming
that eye movement and blink artifacts did not threaten sig-
nal integrity at frequencies of interest, neither horizontal nor
vertical EOG signals were recorded. Subjects were monitored
visually throughout for continued compliance. Signals were
digitized at a sampling frequency of 256 Hz.
Initial testing of the experimental setup involved acquir-
ing data from two subjects while gazing at either a circu-
lar yellow icker stimulus on black background or a sim-
ilarly sized rectangular black and white checkerboard pat-
tern, modulated at several test frequencies between 6 Hz and
25 Hz. On visual inspection of power spectra, it was found
that the checkerboard pattern produced a more pronounced
SSVEP than a icker stimulus modulated at the same fre-
quency. Furthermore, it has been found that to elicit an
SSVEP signal at a certain frequency, a icker stimulus must
be modulated at that frequency, while a checkerboard pattern
need only be modulated at half that frequency, as the SSVEP
is produced at its rate of phase-reversal or alternation rate
[13]. This is an important consideration when using a stan-
dard monitor with refresh rate of 100 Hz. Hence, checker-
board patterns were chosen as stimuli in the following pre-
liminary tests and BCI game. From this point, checkerboard
frequencies will be given in terms of alternation rate, equiva-
lent to the frequency of the SSVEP produced.
Twenty ve seconds of eyes-closed data were rst ac-
quired for each subject to accurately locate alpha frequency.
Testing then proceeded with several 25-second trials during
which the subject viewed a full-screen checkerboard pattern
at frequencies between 6 Hz and 25 Hz, excluding the indi-
viduals alpha band [9]. The power spectra for these data
were examined and the two frequencies eliciting the largest
SSVEPs were identied. The subject then underwent 25-
second trials in which he viewed each one of two bilateral
checkerboard patterns phase-reversing at the two selected
frequencies and this was repeated with positions reversed,
giving a total of 4 trials. Each 4 4 checkerboard patterns
medial edge was situated 4.9
bilateral to a central cross, cen-

tered on the horizontal meridian, and subtended a visual an-
gle of 6.5
vertically and 7.2
horizontally. These dimensions

were determined empirically.
(C) Feature extraction
Two feature extraction methods were employed for compar-
ison in the preliminary data. Each was aimed at utilizing the
separable aspects of the SSVEP signals. For both methods,
each 25-second trial was divided into approximately 50 over-
lapping segments, each of which counts as a single case for
which the feature(s) is derived. Both 1-second and 2-second
segments were used for comparison, with a view to assessing
speed achievable by using each method in real time.
Method 1: squared 4-second FFT
In this method, each one- or two-second segment was ex-
tracted using a Hamming window, zero-padded to 1024 sam-
ples (4s), and the fast Fourier transform(FFT) was calculated
and squared. A single feature was extracted for each segment:
F1(n) = log
_
X
n
( f 1)
X
n
( f 2)
_
, (1)
where
X
n
= mean
2
_
FFT
_
x
n
(t)
_
O1
, FFT
_
x
n
(t)
_
O2
_
, (2)
that is, the square of the FFT averaged over electrode sites
O1 and O2, of the nth segment x
n
(t), and f 1 and f 2 are the
chosen checkerboard frequencies.
40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6
Frequency (Hz)
0
0.2
0.4
0.6
0.8
1
N
o
r
m
a
l
i
z
e
d
P
S
D
17 Hz
20 Hz
Figure 1: Power spectra for full-screen checkerboard trials at 17 and 20 Hz for subject 1. Spectra were calculated using the squared FFT
method averaged across the entire 25-second trial.
Method 2: FFT of autocorrelation
This method is similar in that it also corresponds to calculat-
ing a PSD estimate. In this case, the autocorrelation function
is calculated for each segment followed by the FFT:
F2(n) = log
_
Y
n
( f 1)
Y
n
( f 2)
_
, (3)
where
Y
n
= mean
_
FFT
_
R
xx
n
_
O1
, FFT
_
R
xx
n
_
O2
_
,
R
xx
n
(t) = E
_
x
n
_
t
0
_
x
n
_
t
0
t
__
,
(4)
where the second formula in (4) is the autocorrelation func-
tion of the nth segment x
n
(t).
This method of PSD estimation is more resilient to noise
due to the fact that the autocorrelation of white noise is zero
at all nonzero latencies.
(D) Classication
Linear discriminants were used as the classier model for
this study, providing a parametric approximation to Bayes
rule [14]. In the case of both feature extraction methods,
this corresponds to calculating a threshold in one dimension.
Optimization of the linear discriminant model is achieved
through direct calculation and is very ecient, thus lending
itself well to real-time applications.
Performance of the LDA classier was assessed on the
preliminary data using 10-fold cross-validation [14]. This
scheme randomly divides the available data into 10 approxi-
mately equal-sized, mutually exclusive folds. For a 10-fold
cross-validation run, 10 classiers are trained with a dierent
fold used each time as the testing set, while the other 9 folds
are used for the training data. Cross-validation estimates are
generally pessimistically biased, as training is performed us-
ing a subsample of the available data.
Results
All subjects during preliminary testing were reported to be
fully compliant in following given directions. Analysis of
power spectra during full-screen checkerboard trials resulted
in the selection of 17 Hz and 20 Hz as the bilateral checker-
board frequencies. These frequencies were employed in each
of the four test trials for all subjects. Power spectra for full-
screen checkerboard trials for a representative subject are
shown in Figure 1.
Note that peaks exist at both the frequency of modula-
tion of each constituent square of the checkerboard (hence-
forth referred to as the rst harmonic) and the alterna-
tion rate (second harmonic). Both the icker stimulus and
checkerboard SSVEP frequency eects described above are
exhibited in the spectrum due to the large size of the con-
stituent squares of the full-screen checkerboard pattern. As
expected, the second harmonic was more dominant once
the checkerboards were made smaller such that the pat-
tern as a whole could be viewed in the subjects foveal vi-
sion.
The power spectra for left and right gaze directions for a
representative subject are shown in Figure 2. It can be seen
that for this subject, the magnitude of the SSVEP response
to a 17 Hz stimulus is greater than that for a 20 Hz stim-
ulus, which demonstrates the need for classier training to
determine a decision threshold. Each subjects alpha rhythm
caused little contamination of the spectra, being of low am-
plitude during testingrapid stimulus presentation results
in very little cortical idling in the visual cortex, and short
trial length prevents arousal eects known also to aect al-
pha [15].
The classication accuracy for all ve subjects using the
two feature extraction methods are listed in Tables 1 and 2.
Performance was assessed using both 1- and 2-second seg-
ments, and the question of whether inclusion (by averag-
ing) of the rst harmonic in the feature had any eect was
40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6
Frequency (Hz)
0
0.2
0.4
0.6
0.8
1
N
o
r
m
a
l
i
z
e
d
P
S
D
17 Hz
20 Hz
Figure 2: Power spectra for left and right gaze directions for subject 4. Spectra were calculated using the squared FFT method averaged
across the entire 25-second trial.
Table 1: Oine performance for Method 1 averaged over two checkerboard congurations.
Subject
2nd harmonic only 1st + 2nd harmonic
1s window 2s window 1s window 2s window
Subject 1 88.4% 92.2% 79.3% 88.2%
Subject 2 72.2% 79.0% 70.4% 74.3%
Subject 3 58.7% 62.0% 62.4% 69.3%
Subject 4 75.7% 81.4% 67.4% 72.9%
Subject 5 57.0% 54.2% 52.1% 50.8%
Average across subjects 70.4% 74.4% 66.3% 71.1%
addressed. This results in the augmented feature
F1
(n) = log
_
mean
_
X
n
( f 1), X
n
( f 1/2)
_
mean
_
X
n
( f 2), X
n
( f 2/2)
_
_
(5)
for Method 1 and similarly for Method 2.
For both methods, analysis using 2-second segments is
shown to perform better than 1-second segments. Also it can
be seen that inclusion of the rst harmonic in the augmented
feature in fact degraded performance slightly. Performance
of these two methods was comparable, with the more noise-
resilient autocorrelation method performing marginally bet-
ter as expected.
3. REAL-TIME BCI GAME
3.1. Methods
(A) MindBalancethe game
The object of the MindBalance game is to gain 1D control
of the balance of an animated character on a tightrope using
only the players EEG. As mentioned in Section 1, the game
involves the movement of the animated character within
a virtual environment, with both the character and envi-
ronment modelled as 3D volumes. The lighting and virtual
camera position change in response to the characters move-
ments within the environment. During the game, a musical
soundtrack as well as spoken comments by the character are
also played over the aforementioned speakers to make the
game more engaging.
Acheckerboard is positioned on either side of the charac-
ter. These checkerboards are phase-reversed at 17 and 20 Hz.
A game begins with a brief classier training period. This
requires the subject to attend to the left and right checker-
boards as indicated by arrows for a period of 15 seconds
each. This process is repeated three times (Figure 3). During
this training period, audio feedback is continually presented
using speakers located behind the subject. The audio feed-
back is in the form of a looped double-click sound, the play
speed of which is linearly related to the feature (F1 in the
case of Method 1 or F2 in the case of Method 2). Feedback
is presented in order to ensure compliance during the critical
training period.
In the game, the tightrope walking character walks to-
wards the player and stumbles every 1.55.0 seconds to
one side chosen randomly. The player must intervene to
shift the characters balance so that it remains stable on the
tightrope. To do this, the player must direct his gaze and fo-
cus on the checkerboard on the opposite side of the screen to
Table 2: Oine performance for Method 2 averaged over two checkerboard congurations.
Subject
2nd harmonic only 1st + 2nd harmonic
1s window 2s window 1s window 2s window
Subject 1 89.8% 96.1% 82.9% 89.1%
Subject 2 71.0% 80.4% 73.2% 80.8%
Subject 3 61.7% 65.8% 62.7% 73.5%
Subject 4 80.1% 82.3% 71.9% 78.9%
Subject 5 59.5% 62.0% 59.4% 55.0%
Average across subjects 72.4% 77.3% 70.0% 75.5%
Figure 3: The training sequence.
which the character is losing balance (Figure 4). The char-
acters o-balance animation lasts for 3 seconds. This du-
ration was chosen to give the player time to realize which
checkerboard required xation to elicit the required SSVEPs
and help the character regain his balance. At the end of
the 3-second animation, a decision based on the most re-
cent 1 or 2 seconds of EEG is obtained. To allow for bet-
ter game play, a second more pronounced o-balance 3-
second animation was used in order to give a player a sec-
ond chance in the case where an incorrect decision was
obtained from the EEG. There was also an optional play
mode where an EEG feature value within a certain range
of the decision threshold, when detected at the end of the
o-balance animation, resulted in no decision being taken
and the original 3-second o-balance animation being sim-
ply replayed. This dead zone was removed during our online
tests.
(B) Signal processing and the C# engine
The overall processing system is shown in Figure 5. In or-
der to carry out this study, a programming engine and plat-
form were required, capable of rendering detailed 3D graph-
ics while at the same time processing continuous EEG data
to control a sprite within the game. This was accomplished
using a combined graphics, signal processing, and network
communications engine implemented in C#.
1
One machine
1
Implemented by the MindGames Group at Media Lab Europe.
Figure 4: The character loses balance during the game.
is dedicated to the rendering of the 3D graphics while a sec-
ond machine was dedicated to the real-time data acquisition
and signal processing of the EEG data. This signal processing
engine allows selection of signal processing functions and pa-
rameters as objects to be included into a chain of signal pro-
cessing blocks to perform the required processing. Whenever
a decision on the fate of the animated character is required, a
request in the form of a UDP packet is sent over the local area
network to the signal processing machine which sends back a
decision based on the most recent feature extracted from the
EEG.
(C) Interface equipment of game control
The setup for the real-time BCI game was similar to that used
in the preliminary oine analysis. One dierence was the
amplication stage in which the Grass Telefactor P511 rack
ampliers were replaced by Biopac biopotential ampliers.
The subject was seated in front of a large screen on which
a 140 110 cm image was projected. Within the game pic-
tured in Figures 3 and 4, each 4 4 checkerboard patterns
medial edge was situated 8.5
bilateral to the tightrope, cen-

tered on the horizontal meridian, and subtended a visual an-
gle of 11.4
vertically and 11.8
horizontally.
(D) Subjects and test protocol
Six male subjects aged between 24 and 34 participated in the
following test procedure to assess performance of the real-
time BCI game. All subjects had normal or corrected-to-
normal vision.
Audio/
visual
feedback
O1 & O2
channels
Amplier s
and lters
Data bu er
1 or 2 s
at 256 Hz
Auto
correlation
Hamming
window
& zero pad
1024-point
FFT
PSD estimate
Feature
selection
Freq. power
ratio
Feature
translation
(log)
3D graphics
game
engine
Figure 5: Flowchart of signal processing stages employed in real-time BCI game.
Table 3: Percentage of correct decisions in real-time game play, us-
ing Method 1 with second SSVEP harmonic only.
Subject 1s window 2s window
Subject 1 75.0% 100%
Subject 2 72.7% 100%
Subject 3 75.0% 70.6%
Subject 4 69.2% 100%
Subject 5 87.5% 78.2%
Subject 6 100% 88.2%
Average across subjects 79.9% 89.5%
Each subject was asked to play the game eight times. Four
of the games were played where the EEG was analyzed by
the FFT method described above as Method 1 for the of-
ine data. In two of these games, the decision on the fate
of the tightrope walking character was based on a 1-second
window of EEG data, and in the other two games, the deci-
sion was based on a 2-second window. The other four games
were played using EEG analyzed by Method 2, the autocor-
relation followed by FFT method. Again, two games used 1-
second segments of EEG data and two games used 2-second
segments.
On average, there were eight trials per game. This varied
fromgame to game as a result of the random number of steps
taken by the character between losses of balance and the fact
that in seven of the 48 games played, two consecutive errors
occurred resulting in the character falling from the tightrope
and the end of the game.
Results
Tables 3 and 4 list the percentage of correct decisions result-
ing in the desired regain of balance on the tightrope. In seven
of the 48 games played, two consecutive errors occurred re-
sulting in the character falling from the tightrope, causing
the game to end. Three of the six subjects did not allow the
character to fall o the tightrope in any of the eight games.
One objective measure of BCI performance is the bit rate,
as dened by Wolpaw [16]. For a trial with N possible sym-
bols in which each symbol is equally probable, the probabil-
ity (P) that the symbol will be selected is the same for each
Table 4: Percentage of correct decisions in real-time game play, us-
ing Method 2 with second SSVEP harmonic only.
Subject 1s window 2s window
Subject 1 87.5% 91.7%
Subject 2 50.0% 58.3%
Subject 3 85.7% 46.2%
Subject 4 85.7% 75.0%
Subject 5 63.6% 100%
Subject 6 87.5% 92.3%
Average across subjects 76.7% 77.3%
symbol, and each error has the same probability, then the bit
rate can be calculated as follows:
Bits per symbol = log
2
N + P log
2
P + (1 P) log
2
1 P
N 1
,
Bit Rate = bits per symbol symbols per minute.
(6)
In the case of the present study, one symbol is sent per trial.
Using this denition of bit rate and given that each trial lasts
for 3 seconds and the peak accuracy for the real-time system
is 89.5%, the bit rate is 10.3 bits/min.
4. DISCUSSION
The results from this study indicate that the distinct SSVEP
responses elicited by phase-reversing checkerboard patterns
can be successfully used to make binary decisions in real time
in a BCI-controlled game involving a visually elaborate envi-
ronment.
The two feature extraction methods can be directly com-
pared for the oine data, given that the methods were used
to classify the same data set. The results for both methods
are comparable, with Method 2 performing marginally better
than Method 1. This may be due to the resilience of Method
2 to uncorrelated noise.
In the real-time gaming situation, Method 1 and Method
2 were employed during separate games. Therefore, classi-
cation for the two methods was performed on dierent data
sets. For this reason, and because each subject undertook
a relatively small number of trials, a direct comparison be-
tween the methods in the real-time tests is not as meaningful.
The fact that Method 1 performs better than Method 2 may
be attributable more to the anomalous performance of sub-
jects 2 and 3 during the games played using Method 2 than
to the feature extraction method itself.
In both online and oine testings, classication based
on 2-second windows exceeded that of 1-second windows
for all features. This is to be expected as a 2-second window
gives higher-frequency resolution and allows more accurate
extraction of the SSVEP peak amplitudes. As mentioned ear-
lier, a bit rate 10.3 bits/min is achievable using the full trial
length of 3 seconds, allowing for the time taken for the sub-
ject to respond to the loss of balance of the character in the
game and for the elicitation of the SSVEP. It is also useful
to calculate theoretical bit rate maxima based purely on the
1- and 2-second EEG windows. This gives a peak bit rate
of 15.5 bits/min for the 2-second window and 16.6 bits/min
for the 1-second window. It is worth noting that the bit rate
dened in (6) is designed to encourage accuracy over speed
and as a result, the penalty incurred by the drop in accuracy
almost negates the doubling of the number of symbols per
minute achieved using the 1-second window.
The decrease in performance obtained by the inclusion of
the rst harmonic in the oine testing may be attributed to
noise added to the rst harmonic due to activity in the alpha
band. It was for this reason that the frequencies of the stimuli
were originally chosen outside the alpha range and only the
second harmonic was used in the real-time testing.
Two additional interesting observations were made dur-
ing both the oine and online testings. Firstly, the two inves-
tigators who themselves participated as subjects in the study
achieved better performance both in terms of accuracy in
the oine analysis and in terms of success in completing the
game. This implies that either practice or a more motivated
approach to stimulus xation results in a more pronounced
visual response. This may be thought of in terms of visual
attention. Endogenous modulation of SSVEP response has
been reported as possible in relation to both foveal xated
stimuli [10] and covertly attended stimuli in peripheral vi-
sion [17]. The improved discriminability of the SSVEP with
increased conscious eort may be related to the ability of
the subject to focus selective attention on the xated stimu-
lus, as well as the ability to inhibit processing of distractors
in the peripheral visual eld.
Secondly, in post-experiment debrieng, subjects re-
ported that audio feedback during training aided in the suc-
cessful sustained xation on a particular stimulus and the
inhibition of responses to distractions. Also, in the case of
an error causing the character to drop to the second level
of imbalance, subjects found it possible to adjust their x-
ation strategy, most notably through observing the checker-
board as a whole rather than specically xating on any indi-
vidual elements or allowing perception of the phase reversal
as a moving pattern. These adjustments in xation strategy
prompted by the discrete presentation of biofeedback during
the game in conjunction with the motivation to succeed in
the task evoked by the immersive environment may be the
reason for the better average performance during the real-
time sessions (peak 89.5%) when compared with the oine
results (peak 75.5%).
A possible explanation for the high performance of this
BCI design in spite of continuous distracting stimulation
may be oered by considering the underlying physiology.
The topographic organization of the primary visual cortex
is such that a disproportionately wide cortical area is devoted
to the processing of information from the central or foveal
region of the visual eld, and thus directing ones gaze at a
desired repetitive stimulus produces an SSVEP response to
which all other responses to competing stimuli are small in
comparison.
The SSVEP BCI design has not been actively employed in
alternative or augmentative communication (AAC) for the
disabled. This is partly due to the fact that, for successful
operation, the subjects ocular motor control must be fully
intact to make selections by shifting gaze direction. Given
the range of accessibility options available for the disabled,
it is only in very extreme cases, such as those where reli-
able eye movement is not possible, that a communication
medium driven by EEG generated by the brain itself is ap-
plicable.
While the need for reliable ocular motor control is a
prerequisite for using the BCI described in this paper, we
speculate that the use of the BCI to control a character in
an engaging game such as that described may prove a use-
ful tool in assisting with motivational issues pertaining to
ALS patients. As BCI systems take considerable training to
master, typically several months, this system may serve to
encourage patients to train for a greater length of time. It
may also be possible that through continued and regular
playing of the game, an ALS patient may be able to retain
an acceptable level of control, even after ocular motor con-
trol has deteriorated to the point where eye-tracking sys-
tems are no longer feasible. This would involve detection of
changes in the amplitudes of the SSVEP as modulated by
attention to the stimuli in ones peripheral vision. In order
to explore this idea, the authors are currently extending this
study to covert visual attention, in which subjects direct at-
tention to one of two bilateral stimuli without eye move-
ment.
Also worthy of investigation is the presentation of more
stimuli in order to give multidimensional control in the 3D
environment.
5. CONCLUSION
This paper presented the application of an eective EEG-
based brain-computer interface design for binary control in
a visually elaborate immersive 3D game. Results of the study
indicate that successful binary control using steady-state vi-
sual evoked potentials is possible in an uncontrolled environ-
ment and is resilient to any ill eects potentially incurred by
a rich detailed visual environment. All six subjects demon-
strated reliable control achieving an average of 89.5% correct
selections for one of the methods investigated, correspond-
ing to a bit rate of 10.3 bits/min.
ACKNOWLEDGMENTS
We wish to acknowledge Phil McDarby for his assistance in
designing the gaming environment. We would also like to
thank the subjects who participated in the experimental ses-
sions.
REFERENCES
munication and control, Clinical Neurophysiology, vol. 113,
no. 6, pp. 767791, 2002.
[2] L. A. Farwell and E. Donchin, Talking o the top of your
head: toward a mental prosthesis utilizing event-related brain
potentials, Electroencephalography and Clinical Neurophysiol-
ogy, vol. 70, no. 6, pp. 510523, 1988.
[3] E. E. Sutter, The visual evoked response as a communication
channel, in Proc. IEEE/NSF Symposium on Biosensors, pp. 95
100, Los Angeles, Calif, USA, September 1984.
[4] N. Birbaumer, A. Kubler, N. Ghanayim, et al., The thought
translation device (TTD) for completely paralyzed patients,
IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp. 190193, 2000.
principles, Clinical Neurophysiology, vol. 110, no. 11,
pp. 18421857, 1999.
[6] G. Pfurtscheller and C. Neuper, Motor imagery and direct
brain-computer communication, Proc. IEEE, vol. 89, no. 7,
pp. 11231134, 2001.
[7] J. R. Wolpaw, D. J. McFarland, G. W. Neat, and C. A. Forneris,
An EEG-based brain-computer interface for cursor control,
Electroencephalography and Clinical Neurophysiology, vol. 78,
no. 3, pp. 252259, 1991.
[8] D. Regan, Human Brain Electrophysiology: Evoked Potentials
and Evoked Magnetic Fields in Science and Medicine, Elsevier,
[9] M. Cheng, X. Gao, S. Gao, and D. Xu, Design and implemen-
tation of a brain-computer interface with high transfer rates,
IEEE Trans. Biomed. Eng., vol. 49, no. 10, pp. 11811186, 2002.
[10] M. Middendorf, G. R. McMillan, G. L. Calhoun, and K. S.
Jones, Brain-computer interfaces based on the steady-state
visual-evoked response, IEEE Trans. Rehab. Eng., vol. 8, no. 2,
pp. 211214, 2000.
[11] K. E. Misulis, Spehlmanns Evoked Potential Primer,
Butterworth-Heinemann, Boston, Mass, USA, 1994.
[12] F. Sharbrough, G.-E. Chatrian, R. P. Lesser, H. L uders, M.
Nuwer, and T. W. Picton, American electroencephalographic
society guidelines for standard electrode position nomencla-
ture, Clinical Neurophysiology, vol. 8, no. 2, pp. 200202,
1991.
[13] G. R. Burkitt, R. B. Silberstein, P. J. Cadusch, and A. W. Wood,
Steady-state visual evoked potentials and travelling waves,
Clinical Neurophysiology, vol. 111, no. 2, pp. 246258, 2000.
[14] B. D. Ripley, Pattern Recognition and Neural Networks, Cam-
bridge University Press, Cambridge, UK, 1996.
[15] W. Klimesch, EEG alpha and theta oscillations reect cogni-
tive and memory performance: a review and analysis, Brain
Research Reviews, vol. 29, no. 2-3, pp. 169195, 1999.
[16] J. R. Wolpaw, H. Ramoser, D. J. McFarland, and G.
Pfurtscheller, EEG-based communication: improved accu-
racy by response verication, IEEE Trans. Rehab. Eng., vol. 6,
no. 3, pp. 326333, 1998.
[17] S. T. Morgan, J. C. Hansen, and S. A. Hillyard, Selective at-
tention to stimulus location modulates the steady-state visual
evoked potential, Proceedings of the National Academy of Sci-
ences of the United States of America, vol. 93, no. 10, pp. 4770
4774, 1996.
E. C. Lalor received the B.E. degree in
electronic engineering from University Col-
lege Dublin, Ireland, in 1998 and the
M.S. degree in electrical engineering from
the University of Southern California in
1999. He is currently working towards the
Ph.D. degree in the Department of Elec-
tronic and Electrical Engineering in Uni-
versity College Dublin, Ireland. From 2002
to 2005, he worked as a Research Asso-
ciate with Media Lab Europe, the European research partner
of the MIT Media Lab. His current interests include brain-
computer interfaces and signal processing applications in neuro-
science.
S. P. Kelly received the B.E. degree in elec-
tronic engineering and Ph.D degree in bio-
chemical engineering from University Col-
lege Dublin, Ireland, in 2001 and 2005,
respectively. He is currently a Postdoc-
toral Research Fellow in the Cognitive Neu-
rophysiology Laboratory, Nathan S. Kline
Institute for Psychiatric Research in New
York. His current research interests include
the neurophysiology of selective attention
and multisensory integration in humans, and EEG-based brain-
computer interfacing for alternative communication and con-
trol.
C. Finucane was born in Dublin in 1979. He
graduated from University College Dublin
(UCD) in 2001 with a B.S. degree in elec-
tronic engineering. He subsequently com-
pleted an M. Eng. Sc. degree at UCDand the
National Rehabilitation Hospital for work
entitled EEG-based brain-computer inter-
faces for the disabled in 2003 before join-
ing the Department of Medical Physics, St.
Jamess Hospital, Dublin, where he cur-
rently works as a Medical Physicist. Finucanes research interests
include the development of novel brain-computer interfaces, neu-
rophysiological signal analysis, biomedical applications of multi-
media, wireless and Internet technologies, and biological systems
modelling.
R. Burke received the B.S. Eng. degree in
mathematics and engineering from Queens
University, Kingston, Canada, in 1999, and
an SM degree in media arts and science
from the Massachusetts Institute of Tech-
nology in 2001. From 2002 to 2004, he
worked as a Research Associate with the
MindGames Group at Media Lab Europe,
Dublin. He is currently a Member of the De-
veloper and Platform Group at Microsoft.
R. Smith obtained the B.E. degree in elec-
tronic engineering from University College
Dublin (UCD), Ireland, in 2002. He subse-
quently completed an M. Eng. Sc. at UCD
for research that focused on neurophysi-
ological signal processing and the devel-
opment of brain-computer interfaces. His
research primarily focuses on EEG-based
BCIs and their possibilities in the eld of
neurological rehabilitation.
R. B. Reilly received his B.E., M. Eng. Sc.,
and Ph.D. degrees in 1987, 1989, and 1992,
all in electronic engineering, from the Na-
tional University of Ireland. Since 1996,
he has been on the academic sta in the
Department of Electronic and Electrical
Engineering at University College Dublin.
He is currently a Senior Lecturer and re-
searches into neurological signal processing
and multimodal signal processing. He was
the 1999/2001 Silvanus P. Thompson International Lecturer for
the IEE. In 2004, he was awarded a US Fulbright Award for re-
search collaboration into multisensory integration with the Nathan
S. Kline Institute for Psychiatric Research, New York. He is a re-
viewer for the Journal of Applied Signal Processing and was Guest
Editor for the mini issue on multimedia human-computer inter-
face, September 2004. He is the Republic of Ireland Representative
on the Executive Committee of the IEEE United Kingdom and Re-
public of Ireland Section. He is an Associate Editor for IEEE Trans-
actions on Multimedia and also a reviewer for IEEE Transactions
on Biomedical Engineering, IEEE Transactions on Neural Systems
and Rehabilitation Engineering, IEEE Transactions on Industrial
Electronics, Signal Processing, and IEE Proceedings Vision, Image
& Signal Processing.
G. McDarby obtained the B.E. and M.S. de-
grees in electronic engineering from Uni-
versity College Dublin, Ireland, in 1988 and
1995, respectively. He received the Ph.D. de-
gree in biomedical signal processing in 2000
from the University of New South Wales,
Sydney. Since 2000, he has worked as a
principal Research Scientist in Media Lab
Europe leading a multidisciplinary group
called MindGames. His research is focused
on combining sensory immersion (augmented reality), game play,
novel biometric interfaces, and intelligent biofeedback to construc-
tively aect the state of the human mind. He is strongly committed
to nding ways where technology can be a transformational tool
to people marginalized in society and is heavily involved with the
Intel Computer Clubhouse Programme. He is a much sought-after
speaker on technology and philosophy and has recently been nom-
inated to the European Academy of Sciences for contributions to
human progress.
Estimating Driving Performance Based on
EEGSpectrumAnalysis
Chin-Teng Lin
Brain Research Center, University System of Taiwan, Taipei 112, Taiwan
Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu 300, Taiwan
Email: ctlin@mail.nctu.edu.tw
Ruei-Cheng Wu
Email: allen.ece86g@nctu.edu.tw
Tzyy-Ping Jung
Institute for Neural Computation, University of California, San Diego, La Jolla, CA 92093-0523, USA
Email: jung@sccn.ucsd.edu
Sheng-Fu Liang
Department of Biological Science and Technology, National Chiao-Tung University, Hsinchu 300, Taiwan
Email: siang@mail.nctu.edu.tw
Teng-Yi Huang
Email: tyhuang.ece91g@nctu.edu.tw
Received 12 February 2004; Revised 14 March 2005
The growing number of trac accidents in recent years has become a serious concern to society. Accidents caused by drivers
drowsiness behind the steering wheel have a high fatality rate because of the marked decline in the drivers abilities of perception,
recognition, and vehicle control abilities while sleepy. Preventing such accidents caused by drowsiness is highly desirable but re-
quires techniques for continuously detecting, estimating, and predicting the level of alertness of drivers and delivering eective
feedbacks to maintain their maximum performance. This paper proposes an EEG-based drowsiness estimation system that com-
bines electroencephalogram (EEG) log subband power spectrum, correlation analysis, principal component analysis, and linear
regression models to indirectly estimate drivers drowsiness level in a virtual-reality-based driving simulator. Our results demon-
strated that it is feasible to accurately estimate quantitatively driving performance, expressed as deviation between the center of
the vehicle and the center of the cruising lane, in a realistic driving simulator.
Keywords and phrases: drowsiness, EEG, power spectrum, correlation analysis, linear regression model.
1. INTRODUCTION
Driving safety has received increasing attention due to the
growing number of trac accidents in recent years. Drivers
fatigue has been implicated as a causal factor in many acci-
dents. The National Transportation Safety Board found that
58 percent of 107 single-vehicle roadway departure crashes
were fatigue-related in 1995, where the truck driver survived
and no other vehicle was involved. Accidents caused by
drowsiness at the wheel have a high fatality rate because of
the marked decline in the drivers abilities of perception,
recognition, and vehicle control abilities while sleepy. Pre-
venting such accidents is thus a major focus of eorts in the
eld of active safety research [1, 2, 3, 4, 5, 6]. A well-designed
active safety system might eectively avoid accidents caused
by drowsiness at the wheel. Many factors could contribute
to drowsiness or fatigue, such as long working hours, lack
of sleep, or the use of medication. Besides, another impor-
tant factor of drowsiness is the nature of the task, such as
monotonous driving on highways. The continued construc-
tion of highway and improvement of vehicle equipments
have made it eortless for drivers to maneuver and operate
their vehicles on the road for hours. An examination of the
situations when drowsiness occurred shows that most of the
accidents were on highways [4].
A number of methods have been proposed to detect vig-
ilance changes in the past. These methods can be categorized
into two main approaches. The rst approach focuses on
physical changes during fatigue, such as the inclination of the
drivers head, sagging posture, and decline in gripping force
on steering wheel [7, 8, 9, 10, 11, 12]. These methods can be
further classied as being either contact or else noncontact
types in terms of the ways physical changes are measured.
The contact type involves the detection of drivers movement
by direct sensor contacts, such as using a cap or eyeglasses or
attaching sensors to the drivers body. The noncontact type
makes use of optical sensors or video cameras to detect vig-
ilance changes. These methods monitor driving behavior or
vehicle operation to detect driver fatigue. Driving behavior
includes the steering wheel, accelerator, and brake pedal or
transmission shift level, and the operation of vehicle includes
the vehicle speed, lateral acceleration, and yaw rate or lateral
displacement. Since these parameters vary in dierent vehicle
types and driving conditions, it would be necessary to devise
dierent detection logic for dierent types of vehicles.
The second approach focuses on measuring physiologi-
cal changes of drivers, such as eye activity measures, heart
beat rate, skin electric potential, and particularly, electroen-
cephalographic (EEG) activities as a means of detecting the
cognitive states [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25]. Stern et al. [22, 23] reported that the eye blink
duration and blink rate typically increase while blink am-
plitude decreases as function of the cumulative time on
tasks. Other electrooculographic (EOG) studies have found
that saccade frequencies and velocities decline as time on
the task increases [24, 25]. Recently, Van Orden et al. [14].
further compared the eye-activity-based methods to EEG-
based methods for alertness estimates in a compensatory
visual tracking task. However, although these eye-activity
variables are well correlated with the subject performance,
those eye-activity-based methods require a relatively long
moving-averaged window aiming to track slow changes in
vigilance, whereas the EEG-based method can use a shorter
moving-averaged window to track second-to-second uctu-
ations in the subject performance in a visual compensatory
task [14, 15, 16, 17, 18].
While approaches based on EEG signals have the ad-
vantages for making accurate and quantitative judgments
of alertness levels, most recent psychophysiological studies
have focused on using the same estimator for all subjects
[21, 26, 27]. These methods did not account for large in-
dividual variability in EEG dynamics accompanying loss of
alertness, and thus could not accurately estimate or predict
individual changes in alertness and performance. In contrast,
Makeig and Inlow used individualized multiple linear regres-
sion models to estimate operators changing levels of alert-
ness [18]. Jung et al. further use the neural network model,
applied to EEG power spectrum, in an auditory monitoring
task and showed that a continuous, accurate, noninvasive,
and near real-time estimation of an operators global level
of alertness is feasible [15, 16].
The scope of the current study is to examine neural ac-
tivity correlates of fatigue/drowsiness in a realistic working
environment. Our research investigates the feasibility of us-
ing multichannel EEG data to estimate and predict nonin-
vasively the continuous uctuations in human global-level
alertness indirectly by measuring the drivers driving perfor-
mance expressed as deviation between the center of the ve-
hicle and the center of the cruising lane, in a very realistic
driving task. To investigate the relationship of minute-scale
uctuations in performance to concurrent changes in the
EEG spectrum, we rst computed the correlations between
changes in EEGpower spectrumand the uctuations in driv-
ing performance. We then build an individualized linear re-
gression model for each subject applied to principal compo-
nents of EEG spectra to assess the EEG dynamics accompa-
nying loss of alertness for each operator. This approach can
be used to construct and test a portable embedded system for
a real-time alertness-monitoring system.
This paper is organized as follows. Section 2 describes
the detailed descriptions of the EEG-based drowsiness ex-
perimental setup including the virtual-reality-based highway
scene, subject instructions, physiological data collection, and
alertness measurement. Detailed signal analysis of the col-
lected data is given in Section 3. In Section 4, we explore
the relationship between the alertness level, expressed as the
driving performance, and the EEG power spectrum. Behav-
ioral data are used to evaluate estimation performance of our
alertness-monitoring model. Finally, we conclude our nd-
ings in Section 5.
2. EXPERIMENTAL SETUP
2.1. Virtual-reality-based highway
driving simulator
In this study, we developed a VR-based 3D interactive
highway scene using the high-delity emulation software,
Coryphaeus, running on a high-performance SGI worksta-
tion. First, we created models of various objects (such as
cars, roads, and trees, etc.) for the scene and setup the cor-
responding positions, attitudes, and other relative parame-
ters between objects. Then, we developed the dynamic mod-
els among these virtual objects and built a complete high-
way simulated scene of full functionality with the aid of the
high-level C-based API program. Figure 1 shows the VR-
based highway scene displayed on a color XVGA 15
moni-
tor (304.1 mmwide and 228.1 mmhigh) including four lanes
from left to right, separated by a median stripe to simulate
the view of the driver. The distance from the left-hand side
to the right-hand side of the road is evenly divided into
256 parts (digitized into values 0255). The highway scene
Estimating Driving Performance Based on EEG Spectrum Analysis 3167
0 60 63 123 132 192 195 255
0 32
Figure 1: VR-based highway scene used in our experiments. The
distance from the left side to the right side of the road is evenly
divided into 256 parts (digitized into values 0255). The width of
each lane is 60 units. The width of the car is 32 units. The refresh
rate of highway scene was set properly to emulate a car driving at
100 km/h xed speed on the highway.
changes interactively as the driver/subject is driving the car
at a xed velocity of 100 km/hr on the highway. The car is
constantly and randomly drifted away from the center of
the cruising lane, mimicking the consequences of a non-
ideal road surface. The highway scene was connected to a
36-channel physiological measuring system, where the EEG,
EOG, ECG, and subjects performance, deviations between
the center of the vehicle and the center of the cruising (third)
lane, were continuously and simultaneously measured and
recorded.
2.2. Subjects
Statistical reports [4] showed that the drowsiest time occurs
from late night to early morning, and during the early af-
ternoon hours. During these periods, drowsiness often oc-
curs within one hour of continuous driving, indicating that
drowsiness is not necessarily caused by long driving hours.
Thus, the best time for doing the highway-drowsiness simu-
lation is the early afternoon hours after lunch because drivers
usually get drowsy within an hour of continuous driving. A
total of ten subjects (ages from 20 to 40 years) participated
in the VR-based highway driving experiments. Each subject
completed simulated driving sessions on two separated days.
On the rst day, these participants were told of the general
features of the driving task, completed necessary informed
consent material, and then started with a 15 45 minute
practice to keep the car at the center of the cruising lane
by maneuvering the car with the steering wheel. Subjects re-
ported this amount of practice to be sucient to train par-
ticipants to asymptote on the task. After practicing, partic-
ipants were then prepared with 33 EEG (including 2 EOG)
electrodes referenced to the right earlobe based on a modi-
ed international 1020 system, and 2 ECG electrodes placed
on the chest. After a brief calibration procedure, subjects be-
gan a 45 minute lane-keeping driving task and his/her
EEG signals and driving performance dened as deviations
of the center of the car from the center of the third lane of
the road were measured and recorded simultaneously. Par-
ticipants returned on a dierent day to complete the other
45 min driving session. Participants who demonstrated
waves of drowsiness involving two or more microsleeps in
both sessions were selected for further analysis. Based on
these criteria, ve participants (10 sessions) were selected for
further modeling and cross-session testing.
2.3. Data collection
During each driving session, 33 EEG/EOG channels (us-
ing sintered Ag/AgCl electrodes), 2 ECG channels (bipolar
connection), and the deviation between the center of the
vehicle and the center of the cruising lane are simultane-
ously recorded by the Scan NuAmps Express system (Com-
pumedics Ltd., VIC, Australia). Before data acquisition, the
contact impedance between EEG electrodes and cortex was
calibrated to be less than 5 k. The EEG data were recorded
with 16-bit quantization level at a sampling rate of 500 Hz
and then resampled down to 250 Hz for the simplicity of data
processing.
2.4. Alertness measurement
To nd the relationship between the measured EEG signals
and the subjects cognitive state, and to quantify the level of
the subjects alertness, we dened a subjects driving perfor-
mance index as the deviation between the center of the ve-
hicle and the center of the cruising lane. When the subject is
drowsy (checked from video recordings), the value of driving
performance index increases, and vice versa. The recorded
driving performance time series were then smoothed using a
causal 90-second square moving-averaged lter advancing at
2-second steps to eliminate variance at cycle lengths shorter
than 12 minutes since the uctuates of drowsiness level with
cycle lengths were in general longer than 4 minutes [15, 16].
3. DATA ANALYSIS
The owchart of data analysis for estimating the level of
alertness based on the EEG power spectrum was shown in
Figure 2. For each subject, after collecting 33-channel EEG
signals and driving deviations in a 45-minute simulated driv-
ing session, the EEG data were rst preprocessed using a sim-
ple lowpass lter with a cut-o frequency of 50 Hz to remove
the line noise and other high-frequency noise. Then, we cal-
culated the moving-averaged log power spectra of all 33 EEG
channels. The correlation coecients between the smoothed
subjects driving performance and the log power spectra of
all EEG channels at each frequency band are further evalu-
ated to form a correlation spectrum. The log power spectra
of 2 EEG channels with the highest correlation coecients
are further decomposed using principal component analy-
sis (PCA) algorithm to reduce feature dimensions. Then the
rst 50 representative PCA components with higher eigen-
values were selected as the input vectors of the linear regres-
sion model to estimate the individual subjects driving per-
formance. Detailed analyses are described in the following
subsections.
EEG
Noise
removal
Moving-
averaged
spectral
analysis
EEG log
power spectra
Correlation
analysis
Selected
EEG channels
PCA
Selected
PCA
components
Linear
regression
model
Subjects
driving
performance
Figure 2: Flowchart for processing the EEG signals. (1) A low-
pass lter was used to remove the line noise and higher-frequency
(> 50 Hz) noise. (2) Moving-averaged spectral analysis was used to
calculate the EEG log power spectrum of each channel advancing at
2-second steps. (3) Two EEG channels with higher correlation coef-
cients between subjects driving performance and EEG log power
spectrum were further selected. (4) Principal component analysis
was trained and used to decompose selected features and extract the
representative PCA-components as the input vectors for the linear
regression models. (5) The linear regression models were trained in
one training session and used to continuously estimate and predict
the individual subjects driving performance in the testing session.
3.1. Moving-averaged power spectral analysis
Moving-averaged spectral analysis of the EEG data as shown
in Figure 3 was rst accomplished using a 750-point Han-
ning window with 250-point overlap. Windowed 750-point
epochs were further subdivided into several 125-point sub-
windows using the Hanning window again with 25-point
steps, each extended to 256 points by zero padding for a 256-
point FFT. A moving median lter was then used to average
and minimize the presence of artifacts in the EEG records
of all subwindows. The moving-averaged EEG power spec-
tra were further converted into a logarithmic scale for spec-
tral correlation and driving performance estimation [28, 29].
Thus, the time series of EEGlog power spectrumfor each ses-
sion consisted of 33-channel EEG power spectrum estimated
across 40 frequencies (from 1 to 40 Hz) stepping at 2-second
(500-point, an epoch) time intervals.
3.2. Correlation analysis
Since alertness level uctuates with cycle lengths longer
than 4 minutes [15, 16], we smoothed the EEG power and
driving performance time series using a causal 90-second
square moving-averaged lter to eliminate variances at cy-
cle lengths shorter than 12 minutes. To investigate the re-
lationship of minute-scale uctuations in continuous driv-
ing performance with concurrent changes in the 33-channel
EEG power spectrum over times and subjects, we measured
correlations between changes in the EEG log power spec-
trumand driving performance as forming a correlation spec-
trum by computing the Pearsons correlation coecients be-
tween two time series at each EEG frequency expressed as
Corr
xy
= (
(x x)(y y))/
(x x)
2
(y y)
2
. The
channels with higher correlated coecients between the EEG
log power spectrum and the subject driving performance
were further selected (see Section 4.1), and the dimensions
2 s
Time
750
250
125
256-pt.
FFT
25
Averaged
Figure 3: Block diagramfor moving-averaged spectral analysis. The
EEG data was rst divided using a 750-point Hanning window with
250-point overlap. The 750-point epochs were further divided into
several 125-point frames using Hanning windows again with 25-
point step size, and each frame was applied for a 256-point FFT
by zero padding. Then the subwindow power spectrum was further
averaged and converted to a logarithmic scale to form a log power
spectrum.
of selected EEG power spectrum of such channels were re-
duced using principal component analysis (PCA) algorithm.
3.3. Feature extraction
In this study, we use a multivariate linear regression model
[30] to estimate/predict the subjects driving performance
based on the information available in the EEG log power
spectrum at sites Cz and Pz (as suggested in Section 4.1).
The EEG power spectrum time series for each session con-
sisted of 1350 (750-point, an epoch) EEG power estimates at
40 frequencies (from 1 to 40 Hz) at 25 time intervals. We
then applied Karhunen-Loeve principal component analysis
(PCA) to the full EEG log spectrum to decompose the EEG
log power spectrum time series and extract the directions
of the largest variance for each session. The PCA is a linear
transformation that can nd the principal coordinate axes of
samples such that along the new axes, the sample variances
are extremes (maxima and minima) and uncorrelated. Us-
ing a cuto on the spread along each axis, a sample may thus
be reduced in its dimensionality [31]. The principal axes and
the variance along each of them are given by the eigenvec-
tors and associated eigenvalues of the dispersion matrix. In
our study, the projections of the PCA components account-
ing for the largest 50 eigenvalues were then used as inputs to
train the individual linear regression models for each subject,
which used a 50-order linear polynomial with a least-square-
error cost function to estimate the time course of the driv-
ing performance. Each model was trained using the features
only extracted in the training session and tested on a sepa-
rate testing session of the same subject for each of the ve
selected subjects. The parameters of PCA(eigenvectors) from
the training sessions were used to project features in the test-
ing sessions so that all data were processed in the same way
for the same subject before feeding to the estimation models.
4. RESULTS ANDDISCUSSION
4.1. Relationship between the EEGspectrum
and subject alertness
To investigate the relationship of minute-scale uctuations
in driving performance to concurrent changes in the EEG
spectrum, we measured correlations between changes in the
EEG power spectrum and driving performance by comput-
ing the correlation coecients between the two time series
at each EEG frequency. We refer to the results as forming a
correlation spectrum. For each EEG site and frequency, we
then computed spectral correlations for each session sepa-
rately and averaged the results across all 10 sessions from
the ve subjects. Figure 4a shows the results for 40 fre-
quencies between 1 and 40 Hz. Note that the mean correla-
tion between performance and EEG power is predominantly
positive at all EEG channels below 20 Hz. We also investi-
gated the spatial distributions of these positive correlations
by plotting the correlations between EEG power spectrum
and driving performance, computed separately at dominant
frequency bins 7, 12, 16, and 20 Hz (cf. Figure 4a) on the
scalp (Figure 4b). As the results in Figure 4a show, the cor-
relation coecients plotted on the scalp maps are predom-
inantly positive. The correlations are particularly strong at
central and posterior channels, which are similar to the re-
sults of previous studies in the driving experiments [21, 26,
27]. The relatively high correlation coecients of EEG log
power spectrum with driving performance suggest that us-
ing EEG log power spectrum may be suitable for drowsi-
ness (microsleep) estimation, where the subjects cognitive
state might fall into stage one of the nonrapid eye move-
ment (NREM) sleep. To be practical for routine use during
driving or in other occupations, EEG-based cognitive assess-
ment systems should use as few EEG sensors as possible to
reduce the preparation time for wiring drivers and compu-
tational load for estimating continuously the level of alert-
ness in near real time. According to the correlations shown
in Figure 4b, we believe it is adequate to use the EEG signals
at sites Cz and Pz to assess the alertness level of drivers con-
tinuously.
Next, we compared correlation spectra for individual ses-
sions to examine the stability of this relationship over time
and subjects. Figures 5 and 6 plot correlation spectra at sites
Fz, Cz, Pz, and Oz of two separate driving sessions for ex-
treme cases from Subjects A (best) and B (worst), respec-
tively. The relationship between EEG power spectrum and
driving performance is stable within the subjects, especially
below 20 Hz. However, the relationship is variable from sub-
ject to subject (compare Figures 5 and 6). The time intervals
between the training and testing sessions of the lane-keeping
experiments ranged from one day to one week long for the
selected ve subjects.
40 35 30 25 20 15 10 5 0
Frequency (Hz)
0.4
0.2
0
0.2
0.4
0.6
0.8
C
o
r
r
e
l
a
t
i
o
n
c
o
e
c
i
e
n
t
(a)
0.6
0.4
0.2
0
0.2
0.4
0.6
0.6
0.4
0.2
0
0.2
0.4
0.6
0.4
0.2
0
0.2
0.4
0.4
0.2
0
0.2
0.4
7 Hz 12 Hz
16 Hz 20 Hz
(b)
Figure 4: Correlation spectra. Correlations between EEG power
and driving performance, computed separately for 40 EEG frequen-
cies between 1 and 40 Hz. (a) Grand mean correlation spectra for 10
sessions on 5 subjects. (b) Scalp topographies of the correlations at
dominant frequencies at 7, 12, 16, and 20 Hz.
The above analyses provide strong and converging
evidence that changes in subjects alertness level indexed
by driving performance during a driving task are strongly
correlated with the changes in the EEG power spectrum
at several frequencies at central and posterior sites. This
relationship is relatively variable between subjects, but stable
within subjects, consistent with the ndings from a simple
auditory target detection task reported in [15, 16]. These
ndings suggest that information available in the EEG can
be used for real-time estimation of changes in alertness of
human operators performing monitoring tasks. However,
for maximal accuracy the estimation algorithm should be
capable of adapting to individual dierences in the mapping
between EEG and alertness.
40 30 20 10 0
Frequency (Hz)
0.5
0
0.5
1
C
o
r
r
e
l
a
t
i
o
n
c
o
e
c
i
e
n
t
Run 1
Run 2
(a)
40 30 20 10 0
Frequency (Hz)
0.5
0
0.5
1
C
o
r
r
e
l
a
t
i
o
n
c
o
e
c
i
e
n
t
Run 1
Run 2
(b)
40 30 20 10 0
Frequency (Hz)
0.5
0
0.5
1
C
o
r
r
e
l
a
t
i
o
n
c
o
e
c
i
e
n
t
Run 1
Run 2
(c)
40 30 20 10 0
Frequency (Hz)
1
0.5
0
0.5
1
C
o
r
r
e
l
a
t
i
o
n
c
o
e
c
i
e
n
t
Run 1
Run 2
(d)
Figure 5: Correlation spectra between the EEG power spectrum and the driving performance at (a) Fz, (b) Cz, (c) Pz, and (d) Oz channels
in two separate driving sessions for Subject A (best case). Note that the relationship between the EEG power spectrum and the driving
performance is stable within this subject.
4.2. EEG-based driving performance
estimation/prediction
In order to estimate/predict the subjects driving perfor-
mance based on the information available in the EEG power
spectrum at sites Cz and Pz, a 50-order linear regression
model y =

N=50
i=1
a
i
x
i
+ a
0
with a least-square-error cost
function is used, where y is the desired output, x is the input
feature, N is the order (N = 50 in this case), a
i
s are the pa-
rameters, and a
0
= 1 is the constant. We used only two EEG
channels (Cz and Pz) that showed the highest correlation be-
tween the EEG power spectrum and the driving performance
because using all 33 channels may introduce more unex-
pected noise. Figure 7 plots the estimated and actual driving
performance of a session of Subject A. The linear regression
model in this gure is trained with and tested against the
same session, that is, within-session testing. As can been
seen, the estimated driving performance matched extremely
well the actual driving performance (r = 0.88). When the
model was tested against a separate test session of the same
subject as shown in Figure 8, the correlation between the ac-
tual and estimated driving performance, though decreased,
remained high (r = 0.7). Across ten sessions, the mean cor-
relation coecient between actual driving performance time
series and within-session estimation is 0.90 0.034, whereas
the mean correlation coecient between actual driving
performance and cross-session estimation is 0.53 0.116.
These results suggest that continuous EEG-based driving
40 30 20 10 0
Frequency (Hz)
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
C
o
r
r
e
l
a
t
i
o
n
c
o
e
c
i
e
n
t
Run 1
Run 2
(a)
40 30 20 10 0
Frequency (Hz)
0.5
0
0.5
1
C
o
r
r
e
l
a
t
i
o
n
c
o
e
c
i
e
n
t
Run 1
Run 2
(b)
40 30 20 10 0
Frequency (Hz)
0.5
0
0.5
1
C
o
r
r
e
l
a
t
i
o
n
c
o
e
c
i
e
n
t
Run 1
Run 2
(c)
40 30 20 10 0
Frequency (Hz)
1
0.5
0
0.5
1
C
o
r
r
e
l
a
t
i
o
n
c
o
e
c
i
e
n
t
Run 1
Run 2
(d)
Figure 6: Correlation spectra between the EEG power spectrum and the driving performance at (a) Fz, (b) Cz, (c) Pz, and (d) Oz channels
in two separate driving sessions for Subject B (worst case). Note that the relationship between the EEG power spectrum and the driving
performance is stable within this subject, especially below 20 Hz. However, the relationship is variable from subject to subject (compare
Figures 5 and 6).
performance estimation using a small number of data
channels is feasible, and can give accurate information about
minute-to-minute changes in operator alertness.
5. CONCLUSIONS
In this study, we demonstrated a close relationship be-
tween minute-scale changes in driving performance and
the EEG power spectrum. This relationship appears sta-
ble within individuals across sessions, but is somewhat
variable between subjects. We also combined EEG power
spectrum estimation, correlation analysis, PCA, and lin-
ear regression to continuously estimate/predict uctuations
in human alertness level indexed by driving performance
measurement, deviation between the center of the vehicle
and the center of the cruising lane. Our results demon-
strated that it is feasible to accurately estimate driving errors
based on multi-channel EEGpower spectrumestimation and
principal component analysis algorithm. The computational
methods we employed in this study were well within the ca-
pabilities of modern real-time embedded digital signal pro-
cessing hardware to perform in real time using one or more
2500 2000 1500 1000 500 0
Time (s)
0
10
20
30
40
50
60
70
80
90
100
D
r
i
v
i
n
g
e
r
r
o
r
Actual deviation
Estimated deviation
Figure 7: Driving performance estimates for a session of Subject A,
based on a linear regression (dashed line) of PCA-reduced EEG log
spectra at two scalp sites, overplotted against actual driving perfor-
mance time series for the session (solid line). The correlation coef-
cient between the two time series is r = 0.88.
2500 2000 1500 1000 500 0
Time (s)
0
20
40
60
80
100
120
D
r
i
v
i
n
g
e
r
r
o
r
Actual deviation
Estimated deviation
Figure 8: Driving performance estimates for a test session, based on
a linear regression (dashed line) of PCA-reduced EEG log spectra
from a separate training session of the same subject, overplotted
against actual driving performance time series of the test session
(solid line). The correlation coecient between the two time series
is r = 0.7. Note that the training and testing data in this study were
completely disjoined.
channels of EEG data. Once an estimator has been developed
for each driver, based on limited pilot testing, the method
uses only spontaneous EEG signals from the individual, and
does not require further collection or analysis of operator
performance. The proposed methods thus might be used to
construct and test a portable embedded system for a real-
time alertness-monitoring system.
ACKNOWLEDGMENTS
The authors would like to thank Mrs. Jeng-Ren Duann,
Chun-Fei Hsu, Wen-Hung Chao, Yu-Chieh Chen, Kuan-
Chih Huang, Shih-Cheng Guo, and Yu-Jie Chen for their
great help in developing and operating the experiments. This
work was supported in part by the Ministry of Education,
Taiwan, under Grant EX-91-E-FAOE-4-4 and Ministry of
Economic Aairs, Taiwan, under Grant 93-17-A-02-S1-032
to C. T. Lin and associates and a Grant from Swartz Founda-
tion to T. P. Jung.
REFERENCES
[1] J. French, A model to predict fatigue degraded performance,
in Proc. IEEE 7th Conference on Human Factors and Power
Plants, vol. 4, pp. 69, Scottsdate, Ariz, USA, September 2002.
[2] W. W. Wierwille, S. S. Wreggit, and R. R. Knipling, Develop-
ment of improved algorithms for on-line detection of driver
drowsiness, in Proc. Convergence 94, International Congress
on Transportation Electronics, SAE (Society of Automotive En-
gineers), pp. 331340, Detroit, Mich, USA, October 1994.
[3] A. Amditis, A. Polychronopoulos, E. Bekiaris, and P. C. An-
tonello, System architecture of a drivers monitoring and hy-
povigilance warning system, in Proc. IEEE Intelligent Vehicle
Symposium (IV 02), vol. 2, pp. 527532, Versailles, France,
June 2002.
[4] H. Ueno, M. Kaneda, and M. Tsukino, Development of
drowsiness detection system, in Proc. Vehicle Navigation and
Information Systems Conference (VNIS 94), pp. 1520, Yoko-
hama, Japan, AugustSeptember 1994.
[5] R. Grace, V. E. Byrne, D. M. Bierman, et al., A drowsy driver
detection system for heavy vehicles, in Proc. AIAA/IEEE/SAE
17th Conference on Digital Avionics Systems (DASC 98), vol. 2,
pp. I36/1I36/8, Bellevue, Wash, USA, OctoberNovember
1998.
[6] T. Pilutti and A. G. Ulsoy, Identication of driver state for
lane-keeping tasks, IEEE Trans. Syst., Man, Cybern. A, vol. 29,
no. 5, pp. 486502, 1999.
[7] P. Smith, M. Shah, and N. da Vitoria Lobo, Monitor-
ing head/eye motion for driver alertness with one camera,
in Proc.15th International Conference on Pattern Recognition
(ICPR 00), vol. 4, pp. 636642, Barcelona, Spain, September
2000.
[8] C. M. Frederick-Recascino and M. Hilscher, Monitoring au-
tomated displays: eects of and solutions for boredom, in
Proc. 20th Conference of Digital Avionics Systems (DASC 01),
vol. 1, pp. 5D3/15D3/5, Daytona Beach, Fla, USA, October
2001.
[9] G. Kaefer, G. Prochart, and R. Weiss, Wearable alertness
monitoring for industrial applications, in Proc. 7th IEEE In-
ternational Symposium on Wearable Computers (ISWC 03),
pp. 254255, White Plains, NY, USA, October 2003.
[10] K. B. Khalifa, M. H. Bedoui, R. Raytchev, and M. Dogui, A
portable device for alertness detection, in Proc. 1st Annual In-
ternational IEEE-EMBS Special Topic Conference on Microtech-
nologies in Medicine &Biology, pp. 584586, Lyon, France, Oc-
tober 2000.
[11] C. A. Perez, A. Palma, C. A. Holzmann, and C. Pena, Face and
eye tracking algorithm based on digital image processing, in
Proc. IEEE International Conference on Systems, Man, and Cy-
bernetics (SMC 01), vol. 2, pp. 11781183, Tucson, Ariz, USA,
October 2001.
[12] J. C. Popieul, P. Simon, and P. Loslever, Using drivers head
movements evolution as a drowsiness indicator, in Proc. IEEE
International Intelligent Vehicles Symposium (IV 03), pp. 616
621, Columbus, Ohio, USA, June 2003.
[13] T. L. Morris and J. C. Miller, Electrooculographic and perfor-
mance indices of fatigue during simulated ight, Biological
Psychology, vol. 42, no. 3, pp. 343360, 1996.
[14] K. Van Orden, W. Limbert, S. Makeig, and T.-P. Jung, Eye
activity correlates of workload during a visualspatial memory
task, Human Factors, vol. 43, no. 1, pp. 111121, 2001.
[15] T.-P. Jung, S. Makeig, M. Stensmo, and T. J. Sejnowski, Esti-
mating alertness from the EEG power spectrum, IEEE Trans.
Biomed. Eng., vol. 44, no. 1, pp. 6069, 1997.
[16] S. Makeig and T.-P. Jung, Changes in alertness are a principal
component of variance in the EEG spectrum, Neuroreport,
vol. 7, no. 1, pp. 213216, 1995.
[17] M. Matousek and I. Peters en, A method for assessing alert-
ness uctuations from EEG spectra, Electroencephalography
and Clinical Neurophysiology, vol. 55, no. 1, pp. 108113,
1983.
[18] S. Makeig and M. Inlow, Lapses in alertness: coherence of
uctuations in performance and EEG spectrum, Electroen-
cephalography and Clinical Neurophysiology, vol. 86, no. 1,
pp. 2335, 1993.
[19] J. Qiang, Z. Zhiwei, and P. Lan, Real-time nonintrusive mon-
itoring and prediction of driver fatigue, IEEE Trans. Veh.
Technol., vol. 53, no. 4, pp. 10521068, 2004.
[20] S. Makeig and T.-P. Jung, Tonic, phasic, and transient EEG
correlates of auditory awareness in drowsiness, Cognitive
Brain Research, vol. 4, no. 1, pp. 1525, 1996.
[21] S. Roberts, I. Rezek, R. Everson, H. Stone, S. Wilson, and
C. Alford, Automated assessment of vigilance using com-
mittees of radial basis function analysers, IEE Proceedings
ScienceMeasurement & Technology, vol. 147, no. 6, pp. 333
338, 2000.
[22] J. A. Stern, D. Boyer, and D. Schroeder, Blink rate: a possible
measure of fatigue, Human Factors, vol. 36, no. 2, pp. 285
297, 1994.
[23] J. A. Stern, L. C. Walrath, and R. Goldstein, The endogenous
eyeblink, Psychophysiology, vol. 21, no. 1, pp. 2233, 1984.
[24] D. Schmidt, L. A. Abel, L. F. DellOsso, and R. B. Daro, Sac-
cadic velocity characteristics: intrinsic variability and fatigue,
Aviation, Space and Environmental Medicine, vol. 50, no. 4,
pp. 393395, 1979.
[25] D. K. McGregor and J. A. Stern, Time on task and blink ef-
fects on saccade duration, Ergonomics, vol. 39, no. 4, pp. 649
660, 1996.
[26] B. J. Wilson and T. D. Bracewell, Alertness monitor using
neural networks for EEG analysis, in Proc. IEEE Signal Pro-
cessing Society Workshop on Neural Networks for Signal Process-
ing X, vol. 2, pp. 814820, Sydney, NSW, Australia, December
2000.
[27] P. Parikh and E. Micheli-Tzanakou, Detecting drowsiness
while driving using wavelet transform, in Proc. IEEE 30th
Annual Northeast on Bioengineering Conference, pp. 7980,
Boston, Mass, USA, April 2004.
[28] M. Steriade, Central core modulation of spontaneous oscil-
lations and sensory transmission in thalamocortical systems,
Current Opinion in Neurobiology, vol. 3, no. 4, pp. 619625,
1993.
[29] M. Treisman, Temporal rhythms and cerebral rhythms, in
Timing and Time Perception, J. Gibbon and L. Allen, Eds.,
vol. 423, pp. 542565, New York Academy of Sciences, New
York, NY, USA, 1984.
[30] S. Chatterjee and A. S. Hadi, Inuential observations, high
leverage points, and outliers in linear regression, Statistical
Science, vol. 1, no. 3, pp. 379416, 1986.
[31] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford
University Press, Oxford, UK, 1995.
Chin-Teng Lin received the B.S. degree
from the National Chiao-Tung University
(NCTU), Taiwan, in 1986, and the Ph.D.
degree in electrical engineering from Pur-
due University, USA, in 1992. He is cur-
rently the Chair Professor and Associate
Dean of the College of Electrical Engineer-
ing and Computer Science, and Director of
the Brain Research Center at NCTU. He is
the author of Neural Fuzzy Systems (Pren-
tice Hall). He has published about 90 journal papers includ-
ing over 65 IEEE journal papers. He is an IEEE Fellow for his
contributions to biologically inspired information systems. He
serves on Board of Governors at the IEEE CAS and SMC Soci-
eties now. He has been the President of Asia Pacic Neural Net-
work Assembly since 2004. He has received the Outstanding Re-
search Award granted by the National Science Council, Taiwan,
since 1997 to present, received the Outstanding Engineering Pro-
fessor Award granted by the Chinese Institute of Engineering
(CIE) in 2000, and the 2002 Taiwan Outstanding Information-
Technology Expert Award. He was also elected to be one of the
38th Ten Outstanding Rising Stars in Taiwan (2000). He currently
serves as an Associate Editor of the IEEE Transactions on Cir-
cuits and Systems, Part I & Part II, IEEE Transactions on Systems,
Man, Cybernetics, IEEE Transactions on Fuzzy Systems, and so
forth.
Ruei-Cheng Wu received the B.S. degree
in nuclear engineering from the National
Tsing-Hua University, Taiwan, in 1995, and
the M.S. degree in control engineering from
the National Chiao-Tung University, Tai-
wan, in 1997. He is currently pursuing the
Ph.D. degree in electrical and control en-
gineering at the National Chiao-Tung Uni-
versity, Taiwan. His current research inter-
ests are biomedical signal processing, mul-
timedia signal processing, fuzzy neural networks, and linear con-
trol.
Tzyy-Ping Jung received the B.S. degree in
electronics engineering from the National
Chiao Tung University, Taiwan, in 1984, and
the M.S. and Ph.D. degrees in electrical en-
gineering fromThe Ohio State University in
1989 and 1993, respectively. He was a Re-
search Associate at the National Research
Council of the National Academy of Sci-
ences and at the Computational Neurobi-
ology Laboratory, The Salk Institute, San
Diego, Calif. He is currently an Associate Research Professor at the
Institute for Neural Computation of the University of California,
San Diego. He is also the Associate Director of the Swartz Center for
Computational Neuroscience at UCSD. His research interests are in
the areas of biomedical signal processing, cognitive neuroscience,
articial neural networks, time-frequency analysis of human EEG,
functional neuroimaging, and the development of neural human-
system interfaces.
Sheng-Fu Liang was born in Tainan, Tai-
wan, in 1971. He received the B.S. and M.S.
degrees in control engineering from the Na-
tional Chiao-Tung University (NCTU), Tai-
wan, in 1994 and 1996, respectively. He re-
ceived the Ph.D. degree in electrical and
control engineering from NCTU in 2000.
From 2001 to 2005, he was a Research Assis-
tant Professor in electrical and control engi-
neering, NCTU. In 2005, he joined the De-
partment of Biological Science and Technology, NCTU, where he
serves as an Assistant Professor. He has also served as the Chief
Executive of the Brain Research Center, NCTU Branch, University
System of Taiwan, since September 2003. His current research in-
terests are biomedical engineering, biomedical signal/image pro-
cessing, machine learning, fuzzy neural networks (FNN), the devel-
opment of brain-computer interface (BCI), and multimedia signal
processing.
Teng-Yi Huang received the B.S. degree
in electrical engineering from the National
Central University, Taiwan, in 2002, and
the M.S. degree in electrical and control
engineering from the National Chiao-Tung
University, Taiwan, in 2004. He is cur-
rently pursuing the Ph.D. degree at the Na-
tional Chiao-Tung University, Taiwan. His
research interests are in the areas of biomed-
ical signal processing, biofeedback control,
and virtual reality technology.

Jean-Marc Vesin and Touradj Ebrahimi - Trends in Brain Computer Interfaces

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Jean-Marc Vesin and Touradj Ebrahimi - Trends in Brain Computer Interfaces

Încărcat de

Drepturi de autor:

Formate disponibile

EURASIP Journal on Applied Signal Processing

Trends in Brain Computer Interfaces

is the pooled variance of

10 cross-validation, results of which provide

bilateral to a central cross, cen-

vertically and 7.2

horizontally. These dimensions

bilateral to the tightrope, cen-

vertically and 11.8

S-ar putea să vă placă și