Sunteți pe pagina 1din 7

A New Biometric: Human Identification from Circulatory Function

John M. Irvine1, Steven A. Israel2, Mark D. Wiederhold3,


Brenda K. Wiederhold4
1
SAIC, 20 Burlington Mall Road, Burlington, MA 01803, john.m.irvine@saic.com,
2
SAIC, 4001 North Fairfax Drive: Suite 450, Fairfax, VA 22203
3
SAIC, 10260 Campus Point Drive, San Diego, CA 92121
4
Virtual Reality Medical Center, 6160 Cornerstone Court East, San Diego, CA 92121

Abstract (2001), Biel, et al (2001)] A major challenge to


Numerous biometric techniques exist for developing biometrics based on circulatory function
verifying the identity of individuals. Traditional is the dynamic nature of the physiological process.
biometrics, such as fingerprint, face, and iris Heart rate varies with the subject’s physical, mental,
recognition, rely on a “snapshot” of data that are and emotional state, yet a robust biometric must be
rendered as images. This paper presents a new invariant across these changing states.
biometric technique based on observation of The ECG trace contains a wealth of
physiological functions related to circulation. In information. Researchers have been using ECG data
particular, we present a biometric technique based on as a diagnostic tool since the early 20th century. More
the subject’s electrocardiogram. The development recently, however, researchers been able to apply
and validation of this new biometric technique poses digital analysis to the data [Golden (1973)]. In this
some interesting challenges for design of the paper, we presents an extensive set of ECG
experiments and data analysis. Since a person’s heart descriptors that characterize the trace of a heartbeat.
rate can vary with mental and emotional state, we These ECG descriptors contain information that
developed a data collection protocol in which subjects appears to be stable across an individual’s mental and
perform a variety of tasks designed to elicit varying emotional state, while providing a unique identifier
levels of stress or excitement. In developing and for the individual. Analysis of several data sets
validating the new biometrics, it is necessary to quantifies the performance of ECG as a biometric for
identify features in the physiometric signals that are human identification.
unique to individuals, but invariant to mental and
emotional state. An estimate of the subject’s mental The Electrocardiogram (ECG)
state, based on coincident physiological data, can be
used to refine the data processing and classification The ECG signal measures the change in
techniques. In this paper we present the experiment electrical potential over time. The trace of each
procedures, summarize the data analysis and heartbeat consists of three complexes: P, R, and T.
processing, and present initial performance results. The fiducial points corresponding to the peaks and
inflection points define each complex (Figure 1). The
Introduction labels in Figure 1 document the commonly used
medical science ECG fiducial points.
Biometric techniques, such as face The heartbeat begins with the firing of the
recognition, finger print analysis, iris recognition, and Sinoatrial (SA) node. The SA node () is the heart’s
voice recognition have emerged as methods for dominant pacemaker. The electrical signal radiates
automatically identifying individuals. These outward causing the myocytes to depolarize and
techniques can be implemented to provide automated compress rapidly by a movement of sodium (NA+)
security for facilities, restrict access to computer ions. This is expressed as P wave of the ECG trace.
networks, or verify identification for on-line The depolarization rate slows dramatically when the
transactions. This paper explores a new method for signal hits the atrio-ventricular (AV) node, where the
human identification based on features of chemical signal changes to relatively slow moving
cardiovascular function derived from standard calcium (CA+) ions. The change in contraction is
physilogical measurements. Several methods exist expressed as the gap between the P and the R
for monitoring cardiovascular function, including the complexes. Once past the AV node, the signal passes
electrocardiogram (ECG), pulse oximetry, dynamic through to the cells lining the ventricles. The
blood pressure, and acoustic monitoring of the heart ventricles contract rapidly, which produces the R
or pulse. Initial investigations suggest that these complex. Repolarization does not exactly mirror
signals contain information unique to an individual, polarization due to the chemical agents and the lag
i.e., a biometric. [Irvine, et al (2001), Jang, et al
between the end of the electrical impulse and physical ECG device. Data were acquired from 36 subjects
displacement [Dubin (2000)]. during 51 sessions. Thus, data for two session are
available for 15 individuals. The two tasks performed
during this protocol were a (low stress) baseline and
R-R Interval the same arithmetic stressor that was used in the first
R experiment. Because a clinical instrument was used,
the ECG signal was recorded at 256 Hz and quantized
to 7 bits.
T
P
ECG Processing
Q
S To realize the ideal data structure (Figure 1),
the raw ECG data must be processed to remove the
Figure 1. Ideal ECG Signal: This figure depicts non-signal artifacts. The first step is to eliminate
two idealized heartbeats. The R-R interval obvious the noise in the signal. Based upon the
indicates the length of a heartbeat. The major structure of these noise sources, a filter was designed
ECG complexes comprising one beat are indicated and applied to the raw data. Figure 2 (a and b) show
by P, QRS, and T. the data sample of the high resolution ECG data. The
figures show that the raw data contain both high and
low frequency noise components. These noise
Data Collection
components alter the expression of the ECG trace
Two data collection campaigns provided the from its ideal structure (Figure 1). The low
ECG data for this study. For the first experiment, frequency noise is expressed as the slope of the
data were collected from males and females between overall signal across multiple heartbeat traces in
the ages of 22 and 48. Twenty-nine individuals were Figure 2b. The low frequency noise is generally
used and with twelve repeat sessions totaling forty- associated with changes in baseline electrical
one sessions within the dataset. Each individual potential of the device and is slowly varying. Over
session contained a set of 7 two-minute tasks. The the 20 second segment, the potential change of the
tasks were designed to elicit different levels of mental ECG baseline inscribes approximately 1½ wave
and emotional stress. The low stress tasks included a periods. The high frequency noise is associated with
baseline, a meditation task, and two recovery periods electric/magnetic field of building power (electrical
following high stress tasks. The high stress were a noise) and the digitization of the analog potential
reading task, an arithmetic task, and virtual reality signal (A/D noise). The goal of filtering is to remove
driving simulation. Unlike conventional ECG data, the 0.06 Hz and 60 Hz noise while retaining the
the hardware for this series of experiments collected individual heartbeat information between 1.10 and 40
ECG data at 1000 Hz, a much higher temporal Hz.
resolution than is typical for clinical instruments.
The second experiment used a greatly
simplified protocol and a standard, FDA approved

b.
Figure 2. Raw ECG Data 1000 Hz (a) 20 seconds (b) 2 seconds. The Y axis is electrical potential and the X
axis is time in seconds.
Once the non-signal components were The distances between the fiducial points
removed from the ECG datastream, analysis of the and the R position vary with heart rate. If a linear
ECG trace located the fiducial positions. For human relationship existed between heart rate and those
identification, attributes were extracted from the P, R, distances, normalization is computed as the extracted
and T complexes (Figure 3). Four additional fiducial distance divided by the L’T’ distance. This approach
points were identified. The locations of the four new effectively scales the heartbeat to unit length. The
fiducial positions noted by an apostrophe (‘) are at the normalized features then represent the relative
basal positions of the P and T complexes (Figure 3). positions of the fiducials within the heartbeats. The
Collectively, the fiducials exploit the unique linear normalization has a heuristic rather than a
physiology of an individual. Physically, the L’ and P’ physiological basis. The distance that an electrical
fiducials indicate the start and end of the atrial impulse travels along the atrial axis is fixed, so that
depolarization. The corresponding S’ and T’ changes in heart rate are not evenly distributed across
positions indicate the start and end of repolarization. the P, R, and T complexes.
Problems did arise in some of the processing
data. These problems fall into two classes: excessive
R noise due to poor data collection and atypical ECG
Ventricular traces where identification of the fiducial points is
Depolarization difficult. To address the first type of problem, we are
investigating improved sensor placement and data
Atrial Ventricular acquisition, with the aim of developing a device
Depolarization Repolarization suitable for operational use. The second type of
T problem generally corresponds to individuals with
P unusual features in their ECG traces. For example,
one individual had a double peak in the P complex.
L’ P’ S’ T’ This anomaly was stable across tasks and sessions
S-T and, therefore, would serve as a unique identifier.
Q
P-Q segment Such anomalies, however, make it difficult or
interval impossible to compute the distance features we have
S chosen. Methods for robust exception handling
Q-T could, or course, be developed for these types of
cases.
interval

Figure 3. ECG Trace based upon Cardiac Classification of Heartbeat and Subjects
Physiology. L’ and P’ indicate the start and end of Using the features extracted from each
atrial depolarization, the R complex indicates heartbeat, classification was performed to assign each
ventricular depolarization, and the T complex heartbeat to the corresponding individual. From the
indicates the repolarization. original 15 attributes, 10-12 attributes were
commonly selected based on stepwise discriminant
analysis. The attribute selection process was
The fiducial points were extracted in the performed to ensure stable discrimination. To link
time domain in two stages. The peaks were the performance of the heartbeat classification to
established by finding the local maximum in a region human identification, a voting procedure assigned the
surrounding each of the P, R, and T complexes. The classification to the individual corresponding to the
base positions were determined by tracking downhill largest number of heartbeats.
and finding the location of minimum radius of
curvature. The potential response of a heartbeat is a The classification results correspond to
function of sensor placement for magnitude only. different partitionings of the data into training and
The sensor position does not affect the observed testing sets. To use ECG as a biometric, individuals
timing of the individual P, R, and T complexes. will enroll their information into the security system.
Therefore, the temporal distances among the fiducial After enrollment, the user’s ECG will be interrogated
points are independent of the sensor placement. at the system. Operationally, the enrollment process
Since the R position of the heartbeats was used for corresponds to training the classifier and the use of
aligning the waterfall diagram, the distances were the biometric to identify an individual corresponds to
computed from the other fiducial points to the R the classifier testing. Because of the limited number
position. of subjects, performance shown here may overstate
expected performance in an operational setting.
Further experimentation is needed to address large- (figure 5) and some loss in performance with the
scale performance. The results presented here show clinical instrument (figure 6).
good performance on data from the initial experiment

Problem Data Good Data

Figure 4. Examples of “Problem” and “Good” Data. The upper graphs show the ECG traces, while the lower
ones depict the waterfall diagram in which the heartbeats are aligned according to the R peaks.

Figure 5. Classification Results from the First Data Collection


Classification performance depends on the The critical issue for an operational
variability across subjects compared to the within biometrics is performance across sessions. Given
subject variation. Within subject variation can arise data from one session (the enrollment data), can new
from several sources. Changes in the subject’s data be acquired in a subsequent session days or
mental and emotional state are critical, since heart weeks later that provide an accurate identification of
rate responds to a person’s level of excitement. In the individual. To address this question, data from
addition, variation across sessions, including both the two collections were augmented by clinical ECG
changes related to sensor placement and slowly data from another study. The classification
varying changes in physiological characteristics, must performance on this pooled data set shows about 90%
be understood. To provide a useful biometric for correct classification of individuals using the methods
human identification, the signature should be unique described above (table 1).
to the individual, while the variation attributable to
either mental state or session-to-session changes must The classification analysis exhibits good
be small. performance when training the classifier on data from
one task and testing on data from another task. In
The data from the first collection was
general, the shape of the heartbeat is stable across
analyzed in several ways to explore these issues. In
tasks, but differs across subject. There can be some
the first set of experiments, training data consisted of
stretching or contracting of the overall heartbeat as
one 20-second segment within a single session and
the subject’s rate varies, but the normalization of the
task. The testing data was the remaining 100 seconds
extracted features compensates for this source of
of data from the same session and task. These seven
variation. The average heartbeat within a single task
experiments (one for each task) are labeled “intra
task” in figure 5. The first and second bars indicate shows relatively little task-to-task variation (figure 7).
The differences across subjects, however, are clearly
the percent of heartbeats correctly classified in the
evident. The stability of the features across tasks
training and testing data, respectively. The third bar
suggests that identification based on cardiovascular
shows the percent of subject classified correctly based
function should be robust. Analysis of the specific
on the voting analysis of the heartbeat data. The
features reinforces this idea.
second set of bars, labeled task 1, task 2, etc.,
correspond to training on a 20-second segment for
one task and testing on all other tasks. The label
identifies that task used for training. The final set of
bars indicates performance when the classifier is
trained on data from one session and tested against
data for a different session.

Because the second data collection


employed a simplified protocol, the classification
analysis spans fewer training and testing conditions.
Task 1 represents baseline conditions, while task 2
was the arithmetic task designed to induce stress. As
with the first data collection, the results include
training and testing within a task, training on one task
and testing on another, and training on one session
and testing on a separate session (figure 6). In
general, performance was slightly lower on this data
set than on the first one. Two factors account for the
difference. The reduced temporal sampling (256 Hz
vs. 1,000 Hz) introduces a loss in precision of the
location of the fiducial pints that define the features
used in classification. In addition, less rigorous lab Figure 6. Classification Performance for the
procedures resulted in higher noise arising from Second Data Collection
sensor placement. Subsequent assessment of the
procedures have verified that this noise source can be
eliminated through improved procedures.
Table 1. Combined Performance Results

Experiment Training Test % Training % Test % Identified


Subjects Subjects Heartbeats Heartbeats
Train on first half of data 104 104 56 58 91
Train Session 1 95 55 64 67 88
Train Session 2 59 56 73 62 88

Each block of curves


represents the average
heartbeat for one
individual for each of
the 7 tasks, i.e., 7
curves per subject.

Figure 7. Mean Heartbeat Within a Subject and Task, for Several Subjects and Seven Tasks from the First

Analysis of Features presented here is not the best approach. Classifiers


that can handle disjoint sets are expected to provide
To insure good performance as a biometric, better classification performance. Initial
the underlying features extracted from the ECG signal investigations of a neural network approach shows
should be stable across mental and emotional state, substantial promise and we hope to report complete
stable across session, but show good variability results in the near future.
across subjects. A multivariate ANOVA was
performed to assess these sources of variance. The
contribution of each factor – task, session, and subject
– was estimated for each of the features used in the
classification analysis. Figure 8 shows the relative
contribution for each source of variation, with the
bars summing to 100% for each feature. It is clear
that features are stable across tasks, although certain
features show higher variation across sessions than
would be ideal. Further investigation is underway to
minimize the effects of session-to-session variation
on classification performance.

A more challenging problem, however, Figure 8. Relative Contributions to Overall


arises from the detailed analysis of the feature space.
Variance for Each ECG Feature
When viewed marginally (figure 9) and jointly (figure
10), the distribution of the feature values within an
Discussion
individual is sometime multi-modal. This suggests
that the classes (subjects) may not be linearly The analysis presented here indicates that
separable and the linear discriminant analysis human identification based on cardiovascular
function is feasible. By measuring a subject’s ECG identification, IEEE Transactions on Instrumentation
and extracting features from the signal, it is possible and Measurement, 50 (3) (2001) 808-812.
to classify individuals with high accuracy. The
R. Hoekema, G. J. H. Uijen and A. van Oosterom,
features used for classification are stable across a
(2001) Geometrical aspect of the interindividual
range of mental and emotional tasks, indicating that
variability of multilead ECG recordings, IEEE
this biometric should be robust to an individual’s
Transactions on Biomedical Engineering, 48 (2001)
current mood. The limited data from multiple
551-559.
sessions also shows good behavior. Analysis of the
feature space suggests that alternative classifiers J. M. Irvine, B. K. Wiederhold, L. W. Gavshon, S. A.
merit consideration and these investigations are Israel, S. B. McGehee, R. Meyer and M. D.
currently underway. Wiederhold, (2001) Heart rate variability: A new
biometric for human identification, International
Conference on Artificial Intelligence (IC-AI'2001),
References Las Vegas, Nevada, 2001, pp. 1106-1111.

D. Dubin, (2002) Rapid interpretation of ECGs, D. P. Jang, S. A. Israel, B. K. Wiederhold, M. D.


Cover, Inc., Tampa, Florida, 2000. Wiederhold, S. B. McGehee, L. W. Gavshon, R.
Meyer and J. M. Irvine, (2001) Protocols for
D. P. Golden Jr, R. A. Wolthuis and G. W. Hoffler, protecting patient information within a biometric
(1973) A spectral analysis of the normal resting analysis, Biometrics Section of the International
electrocardiogram, IEEE Transactions on Biomedical Conference on Information Security, Seoul, Korea,
Engineering, BME 20 (September) (1973) 366-373. 2001, pp.
L. Biel, O. Pettersson, L. Philipson and P. Wide,
(2001) ECG analysis: A new approach in human

LPTP LPTP
40 20

30

20 10
Frequency
Frequency

10

0
0
.470 .481 .491 .501 .511 .521 .544 .555 .566 .582
.432 .469 .479 .489 .499 .509 .519 .530 .557
.476 .486 .496 .506 .516 .530 .550 .560 .571
.453 .474 .484 .494 .504 .514 .524 .535

LPTP LPTP

Figure 9. Marginal Distribution for One Feature (L’T’) for Two Subjects

.600
Acknowledgements
.500 This research was supported by the DARPA
Human Identification program under contract
.400
DABT63-00-C-1039. Additional assistance was
sub5 provided by Dr. Rodney Meyer, Dr. Lauren Gavshon,
RL

.300
subj13
subj17
subj20
Ms. Shannon McGee, and Ms. Elizabeth Rosenfeld.
.200
subj38 The authors also wish to thank Dr. P. Jonathon
.100
Phillips, DARPA, for valuable comments concerning
the development of this work. The views expressed
.000
.000 .050 .100 .150 .200 .250 .300 .350 .400 .450
here are those of the authors and do not necessarily
RP reflect the positions of DARPA, SAIC, or the Virtual
Figure 10. Joint Distribution of Two Features (RL Reality Medical Center.
and RP) for Several Subjects.

S-ar putea să vă placă și