Sunteți pe pagina 1din 4

ICIIS'2017 1570365014

1  
2  
3  
4  
Combination of Rule Based Classification and
5  
6  
Decision Trees to Identify Low Quality ECG
7  
8   Mohamed Athif* and Chathuri Daluwatte†
9   * Department of Electronic and Telecommunication Engineering, University of Moratuwa, Sri Lanka
10   †
Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug
11   Administration, USA
12  
13   Abstract— ECG obtained from personal devices by untrained medical data sets are generally class imbalanced [6]. In ECG
14   users need to be assessed for quality before they are sent to data sets, usually there are more recordings of acceptable quality
15   physicians through telemedicine services. This paper discusses a than low quality recordings. Because of this reason, machine
16   machine learning algorithm that identifies low quality ECG learning algorithms that have been developed to classify ECG
recordings to be used in these devices. The proposed algorithm have a higher tendency to classify a recording as acceptable
17  
uses a combination of rule based classification and decision trees. when these data sets are used for training. This is especially
18   A set of 7 features describing physiological relationships between
19   undesirable in applications of telemedicine because low quality
12 ECG leads were used for machine learning. The algorithm was records that are inaccurately classified as acceptable can
20   trained and tested using Physionet Computing in Cardiology
decrease the efficiency of the service.
21   Challenge 2011 test database using 5 fold cross validation. The
22   technique of oversampling was used to reduce the effect of class In this study we used oversampling to address the class
23   imbalance in the database. The algorithm achieved a sensitivity of imbalance in training dataset and evaluated its effect in reducing
24   91.2% and a specificity of 91.5% to differentiate low and high the bias towards the majority class using a combination of rule
quality ECG recordings. based classifier and decision tree algorithm.
25  
26   Keywords—Electrocardiography, Telemedicine, Machine II. METHODS
27   learning
28   A. Data Set
29   I. INTRODUCTION A collection of 1000 ECG records from Physionet
30   In applications of telemedicine, when electrocardiograms Computing in Cardiology Challenge 2011 training set was used
31   (ECGs) are obtained by untrained users using personal ECG in this study [7]. The data consisted of standard 12-lead ECG
32   devices, it is critical to assess the quality of the record before recordings of leads I, II, III, aVR, aVL, aVF, V1, V2, V3, V4,
33   making diagnostic decisions. If the ECG is found to be low V5 and V6 with bandwidth 0.05-100 Hz [8]. These 10 second
34   quality, the user can be prompted to obtain a new recording to records had been simultaneously recorded and sampled at 500
35   be forwarded to the physician. This can improve the efficiency Hz giving 5000 sample points per lead at a resolution of 16 bits.
36   of communication between the user and the physician and The records had been obtained by physicians, nurses, and
increase the accuracy of diagnosis. Therefore, domestic ECG voluntary participants with a varying degree of training and
37  
systems have a need to be equipped with reliable algorithms that practice. Out of the 1000 records, 775 records were labelled as
38  
can identify low quality ECG so that the device can ‘Acceptable’, 223 records were labelled as ‘Unacceptable’ and
39   automatically discard them and request the user for a new 2 records were labelled as ‘Intermediate’. The 2 ‘Intermediate’
40   recording. records were considered ‘Unacceptable’ for this study. The
41   ECGs had been manually annotated by a group of 23 volunteer
42   ECG recordings may become contaminated with different annotators consisting of 2 cardiologists, 1 physician, 5 ECG
43   types of artefacts such as power line noise, electromyographic analysts, 5 others with some experience reading ECGs and 10
44   (EMG) noise, switching of electrodes, poor conduction at skin– volunteers who had never read ECGs previously. Silva et al.
electrode interface, disconnection of an electrode, and/or patient describe the grading and averaging scheme that is used to obtain
45  
movement. Artefacts such as power line noise and EMG noise the final label for these records [8].
46   can be successfully detected and eliminated using frequency
47   based filtering methods. However some artefacts mimic cardiac The original dataset was observed to be highly biased
48   arrhythmia and therefore cannot be clearly identified using towards the ‘Acceptable’ class with 77.5% recordings in it.
49   frequency based filtering due to overlap of ECG signal Therefore a training data set was created by oversampling the
50   bandwidth and artefact bandwidth [1]. As an alternative, original dataset to increase data points belonging to the
51   heuristic rule based methods as well as machine learning ‘Unacceptable’ class. Every ‘Acceptable’ record was sampled
52   methods have been proposed to identify electrode misplacement once and every ‘Unacceptable’ record was sampled N number
53   [2, 3] and motion artefacts [4, 5]. of times, where N is the oversampling factor. Performance
54   metrics of the algorithm were evaluated for N = 1, 2, 3, 4, 5, 6
The performance of machine learning algorithms used for and 7.
55   classification problems heavily depend on the relative sizes of
56   the classes of data used for training them. Because of the
57   abundance of normal cases compared to abnormal cases,
60  
61  
62  
63  
64  
65  

1
B. Rule Based Elimination - Removal of Pseudo Flatlines
A flatline (or asystole) occurs when a subject undergoes
cardiac arrest and is a critical arrhythmia [9]. Therefore true
flatlines should be identified as ‘Acceptable’ records. A true
flatline contains low amplitude baseline variations and spikes
[5]. However, if all 5000 samples of a 10 second record are 0, it
indicates that the electrode has been completely detached from
the skin surface. An ECG lead with a continuous 0 V signal level
is therefore defined as a pseudo flatline. All records in which at
least 1 out of the 12 leads are pseudo flatlines were classified as
‘Unacceptable’.
C. Decision Trees
All records in the training data set that are not removed as
pseudo-flatlines were used to extract features to train the
decision tree (Fig. 1). Decision tree algorithm was chosen for
machine learning as it performed the best among the tested
alternatives (results not provided). A decision tree is a machine
learning tool that classifies data into classes based on an optimal
set of decision boundaries. The decision boundary for each
feature and the order of feature used for splitting are chosen to
maximize the separation between the output classes. This class
separation is quantified by an index such as the Gini’s Diversity
Index.
A feature vector of 7 features (summarized in Table I) was
extracted from the 12 ECG signals for each of the records in the
training data set.
Einthoven’s Error: Einthoven’s Triangle Law describes the
relationship between the three limb leads [10]. It states that 3
limb ECG leads obey the rule, = +
We hypothesize that Einthoven’s Error (Defined in Table I)
corresponds to motion artefacts because of the fact that in case
of a movement localized in either of the three regions of limb
electrode attachment, the Einthoven’s Triangle Law will not
hold.
Fig. 2. Schematic represntation of the algorithm

TABLE I. FEATURES USED ON THE DECISION TREE


No Feature Definition
Einthoven’s 1
1 ( + − )
Error 5000
Fig. 1. The lead configuration demonstrating Einthoven’s Law. Vector Mean 1
addition of the leads corresponds to the physiological relationship between 2 ,
Covariance 144
the three limb leads
Maximum
3 max , | = 1,2, … 12 = 1,2, … 12
Mean, maximum, minimum covariance and Number of large Covariance
Minimum
covariance elements: A 12×12 covariance matrix with element 4
Covariance
min , | = 1,2, … 12 = 1,2, … 12
, in the i row and j column was defined such that , = Number of large Number of covariance elements that satisfies,
( , ). Features 2, 3, 4 and 5 were 5 covariance , 0.1
obtained from the covariance matrix. elements
Number of leads Number of Leads with baseline voltage greater
Number of leads with large baseline voltages and Highest 6 with large than 0.7 mV
variance in baseline voltages: The 12 ECG signals were sent baseline voltages
Highest variance Maximum variance in the absolute values of
through a 6th order Butterworth low pass filter with a cutoff
7 in baseline the baseline of any lead
frequency of 1 Hz to obtain the baseline of the signal. Features voltages
6 and 7 were obtained using the baselines of ECG signals. These
features indicate the waviness of the baseline.
This feature vector was used to train a decision tree. The
maximum number of splits allowed was limited to 7 in order to
minimize the possibility of over fitting while ensuring the

2
possibility of all features to be used at least once. Gini’s Reported performance metrics were obtained by averaging
Diversity Index was used as the splitting criterion. results from 5 fold cross validation. The performance metrics
corresponding to the best performance are shown in the Table
D. Testing and Validation III.
The dataset of 1000 records was used for 5 – fold cross
validation. The dataset was divided into 5 groups of 200 records TABLE III. THE OVERALL PERFORMANCE METRICS
each. In each of the 5 iterations, a different combination of 4 of Metric Mean ± Standard Deviation (%)
them was used as the training set with 800 records. The Accuracy 91.1 ± 2.0
remaining group of 200 records was used for validation. The
Sensitivity 91.2 ± 6.3
overall classification result was obtained by combining the
results of pseudo flatline elimination process and the results of Specificity 91.5 ± 4.6
the decision tree classifier. Accuracy, Sensitivity, Specificity Positive Predicitve Value 75.3 ± 13.1
Positive Predictive Value (PPV) and Negative Predictive Value Negative Predictive Value 97.0 ± 2.9
(NPV) were calculated for each fold of cross validation. We
considered identification of an ‘Unacceptable’ record as low IV. DISCUSSION
quality as the event of interest. Hence we defined,
The Physionet Computing in Cardiology Challenge 2011
No. of correctly identified ′Unacceptable records training dataset has been used in many studies related to ECG
Sensitivity = signal quality analysis [2, 4, 5, 11, 12, 13]. This data set contains
Total number of ′Unacceptable records
775 ‘Acceptable’ records and 225 ‘Unacceptable’ records out of
No. of correctly identified ′Acceptable′ records which 90 records are zero voltage signals and thus contribute
Specificity =
Total number of ′Acceptable′ records little to train an algorithm to discriminate low quality records vs.
No. of correctly identified ′Unacceptable′ records high quality records. The effective ratio of acceptable records to
PPV = unacceptable records is therefore around 6:1. Binary
No. of records classified as ′Unacceptable′
classification trees have been shown to be affected strongly by
No. of correctly identified ′Acceptable′ records class imbalance [14]. Large number of ‘Acceptable’ records
NPV =
No. of records classified as ′Acceptable′ introduces bias to the classifier to predict more records as
‘Acceptable’, thus increasing the specificity at the cost of
PPV and NPV for previous work were calculated using the
sensitivity as seen from Table II. The lower sensitivity of the
reported sensitivity and specificity values. Prevalence of
methods proposed in contemporary work in spite of high
‘Acceptable’ (775) and ‘Unacceptable’ (225) classes in the
accuracy up to 93% can be attributed to the class imbalance of
original dataset were used for these calculations as it had been
the data set used. Table IV shows a comparison of sensitivity
used for training without oversampling in each of the work
and accuracy measures of algorithms that have been developed
studied.
using the same training dataset [7].
Sensitivity × 225
PPV = Oversampling the minority class has increased the
Sensitivity × 225 + (1 − Specificity) × 775
sensitivity. An optimum balance of sensitivity and specificity
Specificity × 775 was achieved with an oversampling factor greater than 3. It
NPV =
Specificity × 775 + (1 − Sensitivity) × 225 should be noted that, oversampling can lead to over fitting
because of the possibility of multiple copies of the same sample
being split using two different instances of the same rule making
III. RESULTS it too specific [14].
The performance of the algorithm showed a strong TABLE IV. COMPARISON OF PERFORMANCE METRICS OF RELATED WORK
connection to the extent of class balance of the training dataset. Previous Accuracy Sensitivity Specificity PPVb NPVb
Table II shows the variation of performance metrics with Work (%) (%) (%) (%) (%)
different oversampling factors. The performance metrics of the Kalkstein et 92.9 74.0a 98.4a 93.1 92.9
algorithm are similar when oversampling factor is greater than al. [4]
3. Here onwards we report results for oversampling factor of 4. Maan et al. 92.2 75.1 97.0 87.9 93.1
[13]
Noponen et 93.2 80.4 96.9 88.3 94.5
TABLE II. VARIATION OF SENSITIVITY AND SPECIFICTY WITH al. [12]
OVERSAMPLING RATIO
Xia et al. 85.9 83.2 95.1 83.1 95.1
Unacceptable : [11]
Oversampling Accuracy Sensitivity Specificity
Acceptable
factor (%) (%) (%) Jekova et al. 94.2 81.1 97.8 91.5 94.7
Ratio
[2]
1 78:613 91.8 78.4 95.0 Our Method 91.1 91.2 91.5 75.3 97.0
2 157:613 90.1 85.8 91.9 a.
Calculated using false positive and false negative values in [4]
3 235:613 91.2 89.6 92.0 b.
Values calculated using specificity, sensitivity and prevalence of ‘Acceptable’ and ‘Unacceptable’
records in the CinC 2011 training data set
4 313:613 91.1 91.2 91.5
In applications of telemedicine, sensitivity to classify a low
5 392:613 90.6 90.9 91.0
quality record has more value than specificity as incorrectly
6 470:613 90.8 91.78 90.8 classifying an ‘Unacceptable’ recording as ‘Acceptable’ can
7 548:613 89.3 91.1 89.2 degrade the efficiency of service. This is especially significant
when untrained users use personal ECG acquiring methods such

3
as electrodes that can be connected to mobile phones [8]. If REFERENCES
records of low quality are sent to physicians, the time and effort [1] P. Hamilton, M. Curley and R. Aimi , "Effect of adaptive motion-artifact
spent by the physician to identify that the recording is of low reduction on QRS detection," Biomedical Instrumentation & Technology,
quality to make a diagnosis and request for another recording vol. 34, no. 3, pp. 197-202, 2000.
makes the telemedicine service burdensome for both the [2] I. Jekova et al.,"Recognition of Diagnostically Useful ECG
physician and the user. When an ECG with artefacts are Recordings:Alert for Corrupted or Interchanged Leads," in Computing in
Cardiology, 2011, pp. 429-432.
mistakenly identified as acceptable records, in rare occasions the
physician might be misled to believe that the patient is having [3] V. Starc, "Could Determination of Equivalent Dipoles from 12 Lead ECG
Help in Detection of Misplaced Electrodes” in Computing in Cardiology,
an arrhythmic event [15]. Meanwhile, if an ‘Acceptable’ record 2011, pp. 445-448.
is inaccurately classified as ‘Unacceptable’ the consequences [4] N. Kalkstein et al., "Using Machine Learning to Detect Problems in ECG
are much less severe. It would simply require the user to obtain Data Collection," in Computing in Cardiology, 2011, pp. 437-440.
another recording within a couple of minutes [5]. We achieved [5] B. E. Moody, "Rule-Based Methods for ECG Quality Control," in
a higher sensitivity with slightly less specificity compared to Computing in Cardiology, 2011, pp. 361-363.
existing work by oversampling the ‘Unacceptable’ class. This [6] D. C. Li, C. W. Liu and S. C. Hu, "A learning method for the class
has increased the instances of low quality records for the imbalance problem with medical data sets," Computers in Biology and
machine learning algorithm to learn, reducing the bias towards Medicine, vol. 40, no. 5, pp. 509-518, 2010.
classifying records as acceptable. The performance we reported [7] A. Goldberger et al.,, "PhysioBank, PhysioToolkit, and PhysioNet:
here was obtained with 5 fold cross validation and a limitation Components of a New Research Resource for Complex Physiologic
Signals," Circulation, vol. 101, no. 23, pp. 215-220, 2000.
of this study is not using an independent test set to confirm the
[8] I. Silva et al., "Improving the Quality of ECGs Collected Using Mobile
performance. Phones: The PhysioNet/Computing in Cardiology Challenge 2011," in
Computing in Cardiology, 2011, pp. 273-276.
The proposed algorithm has a lower PPV and a higher NPV
compared to existing work. The lower PPV compared to [9] G. Ramanathan et al., "Is flat line truly asystole?," Indian Journal of
Anaesthesia, vol. 59, no. 8, pp. 528-529, 2015.
previous work can be attributed to the increase of prevalence of
[10] J. S. Butterworth and J. J. Thorpe, "On Evaluating the Einthoven Triangle
the ‘Unacceptable’ class introduced by oversampling. In this Theory," Circulation, vol. 3, no. 6, pp. 923-925, 1951.
study, the PPV estimates the probability that a record is actually [11] H. Xia et.al.,"Computer Algorithms for Evaluating the Quality of ECGs
low quality if the algorithm indicates that the record is low in Real Time," in Computing in Cardiology, 2011, pp. 369-372.
quality. The 75% PPV implies that, on average, a user will have [12] K. Noponen et al.,"Electrocardiogram Quality Classification based on
to retake an ECG recording unnecessarily for every 4 acceptable Robust Best Subsets Linear Prediction Error," in Computing in
recordings. This is undesirable in the users’ end. However, the Cardiology, 2011, pp. 365-368.
97% NPV indicates that if a record is classified as acceptable by [13] A. C. Maan et al., "Assessment of Signal Quality and Electrode Placement
the algorithm that it has a very high probability of actually being in ECGs using a Reconstruction Matrix," in Computing in Cardiology,
an acceptable record. The loss of efficiency in telemedicine 2011, pp. 289-292.
service cause by communicating a low quality record may be [14] H. He and E. A. Garcia, "Learning from Imbalanced Data," IEEE
Transactions on Knowledge and Data Engineering, vol. 21, p. 1263,
higher than that caused by a user having to retake a recording. 2009.
Therefore, such an algorithm may increase the overall efficiency [15] P. B. Knight et al., "Clinical Consequences of Electrocardiographic
of the telemedicine service Artifact Mimicking Ventricular Tachycardia," The New England Journal
of Medicine , vol. 341, pp. 1270-1274, 1999.
In this study we have combined rule based classification and
[16] H. C. T. Thomas, C. Xiang and L. E. Thiam, "Improving the Quality of
machine learning in two levels. The first level combination was Electrocardiography Data Collected Using Real Time QRS-Complex and
done by removing pseudo flatlines based on a heuristic rule. The T-Wave Detection," in Computing in Cardiology, 2011, pp. 441-445.
second level combination was done in extracting features from [17] J. Kuzilek, M. Huptych, V. Chudacek, J. Spilka and L. Lhotska, "Data
the remaining data points. We included features that correspond Driven Approach to ECG Signal Quality Assesment using Multistep SVM
to the frequency that a variable satisfies a given rule (features 5 Classification," in Computing in Cardiology, 2011, pp. 453-455.
and 6). This approach has shown better performance compared
to previous work where the combination of these two methods
is done by either using a weighted sum [16] or the voting of ACKNOWLEDGMENT
individual classifications [17]. This work was supported in part by the US Food and Drug
Administration’s Medical Countermeasures Initiative and an
The number of features extracted from the ECG for machine
appointment to the Research Participation Program at the Center
learning in this approach is small. The covariance matrix which
for Devices and Radiological Health administered by the Oak
examines the relationship between the 12 leads has been
Ridge Institute for Science and Education through an
represented by 4 single value features that give an overall
interagency agreement between the US Department of Energy
measure of the entire matrix. This can be computationally
and the US Food and Drug Administration.
efficient in environments such as mobile phones. If necessary,
the raw features can be transmitted to be processed remotely. DISCLOSURES
The mention of commercial products, their sources, or their use
in connection with material reported herein is not to be construed
as either an actual or implied endorsement of such products by
the Department of Health and Human Services.

S-ar putea să vă placă și