Documente Academic
Documente Profesional
Documente Cultură
Barnes
Machine Learning
SYS 6016 / CS 6316
1. Summary of Results
The task is to design and evaluate models to use in cardiac rhythm classification and to recommend
the best approach with smaller test cases and cross validation error. In this research, I use different
machine learning approaches such as tree learning, rule learning, and instance-based learners, and
ensemble method. The goal is to distinguish atrial fibrillation from normal sinus rhythm and a
normal sinus rhythm with ectopy.
After doing different experiments with different prediction classifiers, this research shows that
Random Forest, with 500 trees and 4 attributes (HRV,LDs, COSEn, and DFA), gives the highest
prediction accuracy 93.43%.
2. Problem Description
The goal of this research is to classify types of cardiac arrhythmia by building models and
explaining the results based on the data from the University of Virginia (UVa) Health System Heart
Station. The UVa physicians have used Holter monitor to record 2,895 24-h RR interval time series
during the period of 12/2014 to 10/2010 for clinical reasons.
The training data (rhythm-training.csv) include 2,178 labeled instances in rows and 8 numeric
attributes organized in columns. There is no missing value in data. The last column indicates the
classification of the cardiac rhythm, i.e. atrial fibrillation AF (1), normal sinus rhythm NSR (2), and
normal sinus rhythm NSR with Ectopy (3). The test data (rhythm-testing.csv) include 512 unlabeled
instances of rhythm classification. A detailed description about the attributes is summarized in
Table 1.
No
1
2
3
4
5
6
7
8
Attributes
PID
HR
HRV
AGE
LDs
COSEn
DFA
Class
Description
Unique patient id
heart rate
heart rate variability
age of patient
Local Dynamics score for 12-beat segments [2]
Coefficient of Sample Entropy assessed on 30 minute segments [1]
Detrended Fluctuation Analysis (long-range correlations in non-stationary
signals) [3]
{1, 2, 3} class attribute of type AF (1), NSR (2) or NSR with Ectopy (3)
Table 1: Attributes description
L. Barnes
Machine Learning
SYS 6016 / CS 6316
PID variable is not considered in model building. Thus, in total there are only 6 attributes used to
filter Class variable.
To evaluate the models, I use two main performance metrics, i.e. cross validation error and test
error.
Cross Validation Error: 10-fold cross validation is used to build the model on the training
data to investigate how well the models perform.
Test Error: prediction model on the test data is performed to avoid over fitting.
3. Tree Learning
a. Decision Tree
To build a good decision tree, I try with the same decision tree models but with different parameters.
For all decision trees, I use Weka J4.8 classifier to build the model. I use the same numFold to be
2, minimum number of object is 2, and confidence factor is 0.15.
The decision tree model is built on the 6 attributes to distinguish the Class variable with unpruned
and pruned technique.
Model 1: 6 attributes, no pruned
Model 2: 6 attributes, pruned
I also search on the contribution of the 6 attributes by implementing Weka selection evaluator
GainRatioAttributeEval and ranker method. The importance of an attribute measures the gain ratio
with respect to the class.
Average rank
1
2
3
4
5
6
Attribute
HRV
COSEn
DFA
LDs
AGE
HR
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Following the important rank of attributes in Table 2, I build the decision tree with 3 models
(excluding Age and HR, excluding Age and LDs, excluding Age and DFA). The accuracy instance
of model by excluding 2 attributes Age and LDs is smaller than the others. Thus, I only consider 2
models: a model without using Age and HR and a model without using Age and DFA.
The results show that the model performs better over time based on cross-validation accuracy rate.
The entire accuracy and error rate are calculated by using a 10-fold cross validation, where the
training data is divided into 10 parts, i.e. 9 parts for training and 1 part for testing. The following
table shows the structure of the tree.
Tree/Feature
Number
of
Leaves
Size of
Tree
56
111
91.55%
8.45%
0.783
34
67
92.05%
7.95%
0.795
11
21
92.25%
7.75%
0.801
21
41
92.65%
7.35%
0.812
Accuracy Incorrect
instances instances
Kappa
statistic
As processing time in Weka, Model 1 takes 0.06 second long to implement and Model 2 takes 0.1
seconds. Meanwhile, Model 3 and 4 have the same time in implementation 0.04 seconds.
The accuracy instances increases slightly from Model 1 to Model 4 since the Kappa statistic value
increases closely to 1 for each model. It shows that the last model performs better by excluding Age
and DFA from the model. However, this model is more complex than Model 3 with size of tree
equal to 41. Also, Model 4 only gains a little accuracy instances as compared to Model 3. Thus,
Model 3 is chosen as the best model.
Model 1
Actual class
a=1
b=2
c=3
Total
a=1
161
9
15
185
Predicted class
b=2
17
1587
74
1678
c=3
23
46
246
315
Total
201
1642
335
2178
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Model 2
Actual class
a=1
b=2
c=3
Total
a=1
160
4
13
160
Predicted class
b=2
17
1593
70
17
c=3
24
45
252
24
Total
201
1642
335
2178
c=3
28
42
254
315
Total
201
1642
335
2178
c=3
27
45
266
315
Total
201
1642
335
2178
Model 3
Actual class
a=1
b=2
c=3
Total
a=1
159
4
14
185
Predicted class
b=2
14
1596
67
1678
Model 4
Actual class
a=1
b=2
c=3
Total
a=1
157
2
11
185
Predicted class
b=2
17
1595
58
1678
From Table 4-7, I can see that pruned tree always performs better than no pruned tree. It also makes
tree significantly simpler with fewer leaves and smaller size of tree.
Comparing the confusion matrix from the above tables, the number of correct predictions from
Model 1 is 161 + 1,587 + 246 = 1,994 while Model 2 has 2,005 correct predictions, Model 3 has
2,009 correct predictions, and Model 4 has 2,016 correct predictions. As can be observed from
Table 5 and 6, the number of corrections in class 2 and 3 has increased whereas the correct
predictions for class 2 have decreased slightly.
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Figure 1 displays the decision tree of Model 3. It can be observed that COSEn and DFA are the 2
most important predictors as they appear at the top of the trees and in many places over the tree.
b. Random Forests
Random Forest is a collection of decision trees that try to build model on different set of predictors.
After each decision tree is built, the final prediction output is based on how many trees agree on the
same class for dependent variable.
I consider 3 random forest models with each of them have 500 trees inside.
Accuracy
instances
Out of Bag
Error
Incorrect
instances
Kappa
statistic
Model 5: 6 attributes
93.48%
0.0634
6.52%
0.833
Model 6: 4 attributes
(exclude Age and HR)
93.43%
0.0666
6.57%
0.833
Model 7: 4 attributes
(exclude Age and DFA)
93.11%
0.0652
6.89%
0.823
Random Forest
L. Barnes
Machine Learning
SYS 6016 / CS 6316
The random forest models perform significantly better than the decision tree model with higher
accuracy rates. Models 5 and 6 have almost similar accuracy percentage as well as Kappa statistic
value. Model 5 takes longer time in implementation than Model 6 (11.85 seconds comparing to
10.67 seconds). It is hard to conclude which model is better. However, from the confusion matrices
in Table 9 and 10, Model 5 has one more correct prediction than Model 6.
Model 5
Actual class
a=1
b=2
c=3
Total
a=1
163
3
7
173
Predicted class
b=2
17
1601
56
1674
c=3
21
38
272
331
Total
201
1642
335
2178
c=3
24
43
273
340
Total
201
1642
335
2178
c=3
25
40
270
335
Total
201
1642
335
2178
Model 6
Actual class
a=1
b=2
c=3
Total
a=1
163
0
9
172
Predicted class
b=2
14
1599
53
1666
Model 7
Actual class
a=1
b=2
c=3
Total
a=1
157
1
6
164
Predicted class
b=2
19
1601
59
1679
Model 7 has the smallest accuracy instances as compared to Model 5 and 6 (Table 11). The
evaluation measures of Model 7 in Table 8 are smaller than measures of Model 5 and 6. This shows
that model with 6 attributes performs better.
4. Rule Learning
In this part, I will consider two different algorithms: JRIP (Repeated Incremental Pruning to
Produce Error Reduction (RIPPER), which was proposed by William W.) and PART (Combine the
divide-and-conquer strategy with separate-and-conquer strategy of rule learning)[6]. Each of them
will be built using 6 attributes and then 4 attributes (exclude Age and HR). I will analyze the result
to see which of them has a better performance based on the cross validation accuracy.
a. 6 attributes
The result in Table 12 shows that the accuracy of the JRIP is slightly larger than PART. The
accuracy instance in Model 8 is 168 + 1,591 + 259 = 2,018 and the accuracy instance in Model 9 is
L. Barnes
Machine Learning
SYS 6016 / CS 6316
2,007 (Table 13 and 14). It can be concluded that The JRIP model performs better than the PART
model.
Rule Learning
Accuracy
instances
Incorrect
instances
Kappa
Statistic
Model 8: JRIP
92.65%
7.35%
0.812
Model 9: PART
92.15%
7.85%
0.801
Model 8: JRIP
Actual class
a=1
b=2
c=3
Total
a=1
168
4
11
173
Predicted class
b=2
16
1591
65
1674
c=3
17
47
259
331
Total
201
1642
335
2178
c=3
21
59
264
340
Total
201
1642
335
2178
Model 9: PART
Actual class
a=1
b=2
c=3
Total
a=1
164
4
11
172
Predicted class
b=2
16
1579
60
1666
Accuracy
instances
Incorrect
instances
Kappa
Statistic
92.06%
7.94%
0.799
91.74%
8.26%
0.791
L. Barnes
Machine Learning
SYS 6016 / CS 6316
The results from JRIP and PART with 4 prediction attributes are quite smaller than the model with
6 attributes. The accuracy for JRIP is higher than PART model. The accuracy number for JRIP is
2005 and for PART is 1,998.
Model 10: JRIP
Actual class
a=1
b=2
c=3
Total
a=1
153
1
14
173
Predicted class
b=2
19
1585
54
1674
c=3
29
56
267
331
Total
201
1642
335
2178
c=3
31
55
266
340
Total
201
1642
335
2178
a=1
b=2
c=3
Total
a=1
152
7
11
172
Predicted class
b=2
18
1580
58
1666
c.
Accuracy
instances
Incorrect
instances
Kappa
Statistic
92.33%
7.67%
0.801
92.19%
7.81%
0.802
The results of JRIP and PART with 4 prediction attributes are large than the model with 6 attributes.
The accuracy of JRIP model is always higher than PART model. It has a larger number of accuracy
instances (JRIP = 2,011; PART = 2,008)
Model 12: JRIP
Actual class
a=1
b=2
c=3
Total
a=1
154
0
9
173
Predicted class
b=2
23
1596
65
1674
c=3
24
46
261
331
Total
201
1642
335
2178
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Model 13: PART
Actual class
a=1
b=2
c=3
Total
a=1
155
7
8
172
Predicted class
b=2
19
1584
58
1666
c=3
27
51
269
340
Total
201
1642
335
2178
5. Instance-Based Learning
For instance-based learning, I utilize k-Nearest Neighbor classifier with the IBK algorithms in
Weka. A model with 6 attributes and with different k values (k=5 and k=10) is considered. In
addition, a model with 4 attributes with different k values (k from 1 to 10) is also considered to see
which k is good enough for classification.
a. 6 attributes
From Table 21, I can clearly see that the IBK10 performs much better than the IBK5, with a
higher accuracy rate and a better Kappa statistic.
Rule Learning
Accuracy
instances
Incorrect
instances
Kappa
statistic
92.56%
7.44%
0.805
93.16%
6.84%
0.821
The confusion matrix shows that the k-Nearest Neighbor model with k=10 has a much better
prediction rate on all 3 classes (Table 22 and 23).
Model 14: IBK5
Actual class
a=1
b=2
c=3
Total
a=1
170
8
13
173
Predicted class
b=2
19
1606
82
1674
c=3
12
28
240
331
Total
201
1642
335
2178
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Model 15: IBK10
Actual class
a=1
b=2
c=3
Total
a=1
165
2
11
172
Predicted class
b=2
19
1612
72
1666
c=3
17
28
252
340
Total
201
1642
335
2178
Incorrect
instances
Kappa
Statistic
IBK1
90.27%
9.73%
0.753
IBK2
89.90%
10.10%
0.732
IBK3
92.15%
7.85%
0.799
IBK4
91.87%
8.13%
0.785
IBK5
92.56%
7.43%
0.805
IBK6
92.29%
7.71%
0.797
IBK7
92.70%
7.30%
0.810
IBK8
92.88%
7.12%
0.814
IBK9
92.79%
7.21%
0.813
IBK10
93.16%
6.84%
0.821
Table 24: The information of the fitted model with 4 attributes excluding Age and HR with different k values
Following the information in Table 24 and observing the plot in Figure 2, it can be concluded that
k=10 nearest neighbors is the best number to build classifier model. The accuracy increases when
the number of nearest neighbors is increased.
10
L. Barnes
Machine Learning
SYS 6016 / CS 6316
The confusion matrix of Model 16 (model with 10-number nearest neighbor classification) has high
prediction accuracy, i.e. 2,025. It performs better as compared to Model 14 and 15 with 6 attributes.
Model 16: IBK10
Actual class
a=1
b=2
c=3
Total
a=1
158
0
7
172
Predicted class
b=2
19
1608
69
1666
c=3
24
34
259
340
Total
201
1642
335
2178
11
L. Barnes
Machine Learning
SYS 6016 / CS 6316
The minimum MSE is obtained at 39th iteration. The confusion matrix for training, validating and
testing data are given in Figure 5.
The misclassification error in the training data set is 6.5%. The misclassification error is reduced in
validation set to 6.4%. In test set, the outstanding result is obtained with misclassification error
3.1% and with 96.9% accuracy. This is the best result obtained so far. The overall accuracy is 94%
with 6% error.
12
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Figure 4: Confusion matrices for Training, validation, testing and overall performance
7. Ensemble Methods
Ensemble methods use multiple models to obtain better predictive performance than could be
obtained from any of the constituent models. There are various ensemble methods. Here, I use
three methods namely AdaBoostM2, Bag and StackingC to build three different models. The Tree
is selected as the weak learner for all the models. Then I use the same algorithms to build the model
with 4 attributes.
13
L. Barnes
Machine Learning
SYS 6016 / CS 6316
a. 6 attributes model
Models using AdaBoostM2
To build this model, 100 tree learners are used. AdaBoostM2 is a well-known boosting algorithm
for multiclass classification (3+ classes). This algorithm trains learners sequentially and computes
the weighted classification error. The result of this model is given below:
Training set accuracy = 91.80%
Test set accuracy = 93.27%
The change in test classification error with an increase in number of trees is given in Figure 6.
Figure 5: Change in test classification error with number of trees for AdaBoostM2
14
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Using the cross validation with 5 folds, the cross validation error is calculated with an increase in
number of trees. The change in test classification error with number of trees for test and cross
validated set is given in Figure 7.
Figure 6: The change in classification error with number of trees for test and cross validation
Accuracy
instances
Incorrect
instances
Kappa
Statistic
93.48%
6.52%
0.833
15
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Out of the tree models, the model using Bag method gives the highest prediction accuracy. Its
confusion matrix also shows the best results to predict each class (Table 27).
a=1
b=2
c=3
Total
a=1
161
2
9
172
Predicted class
b=2
19
1601
52
1666
c=3
21
39
274
340
Total
201
1642
335
2178
b. 4 attributes
In this part, I rerun the above algorithms to the model with 4 attributes (exclude HR and Age). The
number of trees used in Bagging is 100 trees.
Accuracy
instances
Incorrect
instances
Kappa
Statistic
92.06%
7.94%
0.797
92.38%
7.62%
0.806
93.02%
6.98%
0.822
Instance-based learning
Model 18 takes 0.66 seconds to perform while model 19 takes 0.33 seconds. The StackingC method
takes longest time to perform (>100 seconds) because it uses random forest in combine prediction.
However, the StackingC algorithm gives a very good result with a high accuracy rate and a high
kappa statistic (Table 29, 30, and 31). The accuracy of StackingC in the case of 4 attributes is much
better than in case of 6 attributes.
Model 18
Actual class
a=1
b=2
c=3
Total
a=1
160
3
17
172
Predicted class
b=2
16
1590
63
1666
c=3
25
49
255
340
Total
201
1642
335
2178
16
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Model 19
Actual class
a=1
b=2
c=3
Total
a=1
161
2
14
172
Predicted class
b=2
17
1589
59
1666
c=3
23
51
262
340
Total
201
1642
335
2178
c=3
28
44
269
340
Total
201
1642
335
2178
Model 20
Actual class
a=1
b=2
c=3
Total
a=1
159
0
9
172
Predicted class
b=2
14
1598
57
1666
L. Barnes
Machine Learning
SYS 6016 / CS 6316
(91.912%). The model 7 without Age and DFA was predicted on the test data and has highest
predicted classification with 94.48%. Other models in different method that I have selected to test
(Model 6, 8, 14, 18) have lower accurate prediction than Model 7. However, based on the table 32,
it shows that random forest 500 trees and stackingC models give the best percentage of accuracy
on training data. The test set consists only 270 instances compared to (12%) 2,178 instances in
training set. Therefore the predictions on test set may overfit.
Random Forest with 500 trees on 4 attributes and StackingC on 6 attributes give me similar
prediction accuracy but I pick Random Forest model because it has smaller attributes. This gives
the Random Forest a simpler model and can avoid over fitting issues (Table 32).
Tree/Feature
Decision tree
Model 1: 6 attributes, unpruned
Model 2: 6 attributes, pruned
Model 3: 4 attributes, pruned
(exclude Age and HR)
Model 4: 4 attributes, pruned
(exclude Age and DFA)
Random forests
Model 5: 6 attributes
Model 6: 4 attributes (exclude Age
and HR)
Model 7: 4 attributes (exclude Age
and DFA)
Rule learning
Model 8: JRIP with 6 attributes
Model 9: PART with 6 attributes
Model 10: JRIP with 4 attributes
(exclude Age and HR)
Model 11: PART with 4 attributes
(exclude Age and HR)
Model 12: JRIP with 4 attributes
(exclude Age and DFA)
Model 13: PART with 4 attributes
(exclude Age and DFA)
Instance based learning
Model 14: IBK5 with 6 attributes
Model 15: IBK10 with 6 attributes
Model 16: IBK10 with 4 attributes
(exclude (Age and HR)
Neural network learning
Overall accuracy
Accuracy
instances
Incorrect
instances
Kappa
statistic
91.55%
92.05%
8.45%
7.95%
0.783
0.795
92.25%
7.75%
0.801
92.65%
7.35%
0.812
93.48%
6.52%
0.833
93.43%
6.57%
0.833
93.11%
6.89%
0.823
92.65%
92.15%
7.35%
7.85%
0.812
0.801
92.06%
7.94%
0.799
91.74%
8.26%
0.791
92.33%
7.67%
0.801
92.19%
7.81%
0.802
92.56%
93.16%
7.44%
6.84%
0.805
0.821
93.16%
6.84%
0.821
94.00%
6.00%
18
L. Barnes
Machine Learning
SYS 6016 / CS 6316
Ensemble method
Model 17: AdaBoostM2 with 6
attributes
Model 18: Bagging with 6 attributes
Model 18: StackingC with 6
attributes
Model 18: AdaBoostM1 with 4
attributes (exclude Age and HR)
Model 19: Bagging with 4 attributes
(exclude Age and HR)
Model 20: StackingC with 4
attributes (exclude Age and HR)
91.80%
8.20%
100.00%
0.00%
93.48%
6.52%
0.833
92.06%
7.94%
0.797
92.38%
7.62%
0.806
93.02%
6.98%
0.822
References
1. Lake DE, Moorman JR. Accurate estimation of entropy in very short physiological time
series: the problem of atrial fibrillation detection in implanted ventricular devices. Am J
Physiol Heart Circ Physiol, 300:H319-H325, 2011.
2. Moss Travis J, Lake DE, Moorman JR. Local dynamics of heart rate: detection and
prognostic implications. Physiological Measurement. In press.
3. Peng CK, Havlin S, et al. Quantification of scaling exponents and crossover phenomena
in nonstationary heartbeat time series. Chaos 5, 82, 1995.
4. Ian H.Witten, Eibe Frank, Mark A.Hall. Data mining practical machine learning tools
and techniques. Third Edition.
5. Ian H. Witten. Data Mining with Weka. Available:
http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/slides/Class4DataMiningWithWeka-2013.pdf
6. Data Mining Rule-based Classifiers. Available:
http://staffwww.itn.liu.se/~aidvi/courses/06/dm/lectures/lec4.pdf
19