Sunteți pe pagina 1din 4

Heart Sound Classification Using Deep Structured Features

Michael Tschannen, Thomas Kramer, Gian Marti, Matthias Heinzmann, Thomas Wiatowski

Dept. IT & EE, ETH Zurich, Switzerland

Abstract ents of our method are a deep CNN-based feature extrac-


tor employing wavelet filters [8] and a SVM. By relying
We present a novel machine learning-based method for on pre-specified wavelet filters, instead of learning the fil-
heart sound classification which we submitted to the Phy- ters from the data as in most standard deep CNN architec-
sioNet/CinC Challenge 2016. Our method relies on a ro- tures, not only we decrease the training time drastically,
bust feature representation—generated by a wavelet-based but also we reduce the risk of overfitting due to the small
deep convolutional neural network (CNN)—of each car- training set at hand. We note that wavelet-based features
diac cycle in the test recording, and support vector ma- in combination with a SVM have been considered previ-
chine classification. In addition to the CNN-based fea- ously for heart sound classification, e.g., in [9–11]. How-
tures, our method incorporates physiological and spectral ever, these methods employ the wavelet transform only,
features to summarize the characteristics of the entire test i.e., they can be considered as single layer CNNs with-
recording. The proposed method obtained a score, sensi- out non-linearity and are hence “shallow”, whereas our
tivity, and specificity of 0.812, 0.848, and 0.776, respec- “deep” approach employs wavelets as filters in a CNN (i.e.,
tively, on the hidden challenge testing set. we employ wavelets and, additionally, non-linearities and
pooling operations at multiple layers) to compute a rich
and robust feature representation.
1. Introduction For a more comprehensive review of prior work on heart
sound classification we refer to [1, Sec. 3].
Current state-of-the-art methods for automated classifi-
cation of pathology in heart sound recordings often suf-
fer from poor generalization capabilities because they were 2. Methods
trained and/or evaluated on small and/or carefully selected
Our method (see the illustration in Figure 1) consists
data sets. The aim of the PhysioNet/CinC Challenge 2016
of a feature extraction stage and a classification stage. In
is to encourage the development of robust heart sound
the former stage, two types of features are extracted from
classification algorithms delivering accurate predictions in
the test heart sound recording, namely “deep features” that
both real-world clinical and non-clinical environments [1].
provide a robust characterization of the shape and mor-
In recent years, deep convolutional neural networks
phology of each cardiac cycle1 in the recording, and “sum-
(CNNs) [2, 3] have proven to be tremendously successful
mary features” that describe the entire recording. The ex-
in many practical classification tasks. By feeding the in-
traction of deep features hence requires segmentation of
put signal through a sequence of modules—each of which
the test recording into cardiac cycles. Each cardiac cy-
computes a convolutional transform, a non-linearity, and
cle is associated with the feature vector obtained by con-
a pooling operation—these networks extract features that
catenating the corresponding deep features and the sum-
incorporate signal characteristics important for discrimi-
mary features (i.e., the summary features are shared across
nation (e.g., higher order moments [4]) while suppressing
all feature vectors extracted from the test recording). In
irrelevant variations (such as the temporal locations of sig-
the classification stage, each feature vector is classified
nal characteristics [5, 6]). Although deep CNNs are often
into {“normal”, “abnormal”} (and possibly “unsure”, due
used to perform classification directly [2, 3], usually based
to poor signal quality) using a L2 -SVM with radial basis
on the output of the last network layer, they can also act
function (RBF) kernel, noting that the prediction for the
as stand-alone feature extractors [7] with the extracted fea-
entire recording is obtained as the majority vote over all
tures fed into a classifier such as, e.g., a support vector
cardiac cycles.
machine (SVM).
The motivation for including summary features in ad-
In this paper, we present a novel machine learning-based
method for heart sound classification which we submitted 1 The term “cardiac cycle” henceforth refers to the cardiac cycle itself
to the PhysioNet/CinC Challenge 2016. The key ingredi- or to the corresponding segment of the heart sound recording.

Computing in Cardiology 2016; 43: 565


Figure 1. Illustration of the proposed method.

dition to deep features is that classification based on deep The final feature vector describing f is obtained by col-
features and majority voting alone may not sufficiently ac- lecting (in a single feature vector) (i) every feature map
count for information that is spread over the entire recor- fnd , 1 ≤ d ≤ D, 1 ≤ n ≤ J d , generated in the network,
ding such as, e.g., heart rate variability. The effect of sum- (ii) low-pass filtered versions of the feature maps fnd , and
mary features on the classification performance is numeri- (iii) a low-pass filtered version of the signal f itself. Fig-
cally studied in Section 3. ure 3 shows an example feature vector of a cardiac cycle
In the following, we describe all parts of our method in for a network of depth D = 3 employing J = 3 wavelet
detail and discuss its evaluation and parameter selection. scales, the network parameters used for the experiments in
Segmentation: We use the heart sound segmentation al- Section 3.
gorithm from [12], which leverages a hidden semi-Markov      2 
model and Viterbi decoding to segment the test heart sound P3 ρ3 f22 ∗ ψj2 P3 ρ3 f28 ∗ ψj2
recording into the four heart sound states S1 (first heart
sound), systole, S2 (second heart sound), and diastole.      
P2 ρ2 f11 ∗ ψj1 P2 ρ2 f91 ∗ ψj3
Deep features: We employ the tree-like CNN-based
feature extractor proposed in [8], which we briefly review
in the following. Every layer of the network—specified      
by the layer index 1 ≤ d ≤ D—is associated with a P1 ρ1 f10 ∗ ψj1 P1 ρ1 f10 ∗ ψj3
collection of pre-specified Haar wavelet filters2 {ψj }Jj=1
f = f10
[13], a pointwise Lipschitz-continuous non-linearity ρd ,
and a Lipschitz-continuous pooling operator Pd . Convolu- Figure 2. Tree-like deep CNN (of depth D = 3 employing
tions with wavelet filters, besides allowing for an efficient J = 3 wavelet scales) underlying the feature extractor de-
implementation using the algorithme à trous [13, Sec. scribed in Section 2. The root of the network corresponds
5.2.2], resolve characteristics of a signal at multiple scales to d = 0. The signal fnd , defined in (1), corresponds to the
1 ≤ j ≤ J (respectively, signal characteristics that corre- n-th feature map in the d-th network layer.
spond to dyadic frequency bands [−2−(j−1) , −2−(j+1) ] ∪
[2−(j+1) , 2−(j−1) ]), the application of a pointwise non- Before the cardiac cycles are fed into the feature extrac-
linearity ρd activates or de-activates features, and the appli- tion network, they are re-sampled to a length of 1024 to
cation of a pooling operator Pd reduces the signal dimen- ensure that they are all mapped to feature vectors of equal
sion and renders the features robust w.r.t. non-linear defor- dimension. Furthermore, each cardiac cycle is normalized
mations and translations. Here, for all layers 1 ≤ d ≤ D, by mean subtraction and division by its standard devia-
we use the rectified linear unit (ReLU) non-linearity and tion. For the described network parameters the dimension
the max-pooling operator (see, e.g., [8, Sec. 2.2, 2.3] for of the feature vectors is 12,160. To reduce computational
definitions). Every layer of the network computes a set of complexity (in particular during training) the dimension of
d
so-called feature maps {fnd }Jn=1 according to the feature vectors is reduced to 400 by principal compo-
   nent analysis. Furthermore, the durations of the four heart
fnd := f(k,j)
d
:= Pd ρd fkd−1 ∗ ψj , (1) sound states are appended to the (dimensionality-reduced)
feature vectors as additional features.
where 1 ≤ k ≤ J d−1 , 1 ≤ j ≤ J, f10 := f is the input
Preliminary experiments showed that the modulus non-
signal (here, a cardiac cycle) fed into the network, and ∗
linearity and pooling by sub-sampling leads to marginally
denotes the circular convolution operator. The underlying
worse classification performance than the ReLU non-
tree-like network architecture is illustrated in Figure 2.
linearity combined with max-pooling. These experiments
2 The networks considered in [8] allow for general frame filters. also revealed that increasing the number of principal com-

566
2 labels, and ternary classification into {“normal”, “abnor-
1 mal”, “unsure”}, for which all the recordings with “poor”
0 signal quality were labeled “unsure”. The parameter of
−1 the RBF kernel and the regularization parameter of the
0 1 2 3 L2 -SVM were selected using 5-fold stratified (by patient)
layer index cross-validation. Class-adaptive sample weights were used
to compensate for the class imbalance in the data set.
Figure 3. Example feature vector generated by the deep Note that for ternary classification the sample weights were
CNN defined in Section 2. The 0-th layer corresponds to computed based on the labels “normal”/“abnormal” only
the low-pass filtered input cardiac cycle. as inclusion of the label “unsure” in the weight computa-
tion reduced the MAcc.
ponents or the depth D of the feature extraction network Modification for unsupervised ternary classification:
does not significantly improve classification performance. We briefly outline a simple modification (not used to ob-
tain the results in Section 3) of our method to learn a
Finally, an alternative to extract deep features from the ternary classifier based on the labels {“normal”, “abnor-
1-D cardiac cycles is to compute a 2-D time-frequency re- mal”} only. Specifically, this extension implements a so-
presentation (e.g., a spectrogram) of the cardiac cycles and
called reject option [15]. Assuming an estimate P̂ (Y |r)
feeding them into our feature extraction network equipped
of the posterior probability P (Y |r) of the label Y ∈
with 2-D Haar wavelet filters (and, of course, 2-D pooling
{“normal”, “abnormal”} given the test recording r to be
operators). Intuitively, such an approach might lead to a
richer feature representation allowing for better discrimi- available, a ternary prediction Ŷter is obtained as

nation between normal and abnormal heart sounds. How- ⎪
⎨“normal”, if P̂ (Y = “abnormal”|r) < τ
ever, preliminary experiments showed that this approach
Ŷter = “abnormal”, if P̂ (Y = “abnormal”|r) > 1 − τ
does not improve classification performance. ⎪

“unsure”, otherwise ,
Summary features (state statistics and PSD): We rely
on the 20 features described in [1, Sec. 6.2] consisting where τ ∈ (0, 1/2] is a threshold parameter. Under certain
of first and second order statistics of amplitudes and dura- (not necessarily realistic) model assumptions one can mo-
tions associated with the four heart sound states obtained tivate the estimation of P̂ (Y |r) according to P̂ (Y |r) :=
through segmentation. We refer to this set of features as
L
(1/L) =1 P̂ (Y |b ), where {b }L =1 are the cardiac cy-
state statistics, see also Figure 1. In addition, we use cles in the test recording r. With the heart sound classi-
a power spectral density (PSD) estimate of length 128 fication method described above, the posterior probability
(covering the spectral band 0-500Hz) computed from the estimates P̂ (Y |b ) can be obtained either from the SVM
raw (unsegmented) heart sound recording using the Welch model using Platt scaling [16], or by replacing the SVM
method [14, Sec. 2.7.2] with half-overlapping Hamming model with a logistic regression model. The threshold pa-
windows. The PSD estimate provides a compact descrip- rameter τ can be optimized using cross-validation. If the
tion of the second order statistics of the heart sound recor- score used to assess the performance of the classifier does
ding and may improve the robustness of the classification not sufficiently reward the label “unsure”, τ = 0.5 will be
when the segmentation is inaccurate. selected, which amounts to binary classification.
Evaluation and parameter selection: We evaluated
the proposed method on the publicly available Phy- 3. Results
sioNet/CinC Challenge 2016 data set containing 3,153
heart sound recordings of 764 subjects, including both Table 1 shows the 5-fold cross validation MAcc for bi-
healthy individuals and patients with different heart di- nary and ternary classification. To study the effect of dif-
seases. Each recording has two labels, the first of which ferent features we report the performance for classification
indicates whether the subject is healthy (“normal”) or was based on deep features only (DF), deep features and state
diagnosed with a cardiac disease (“abnormal”), and the statistics (DF + SS), as well as deep features and all sum-
second indicating the signal quality (“good”/“poor”). We mary features (DF + SS + PSD).
refer the reader to [1] for a detailed description of the The highest MAcc we obtained during the official phase
data set. The classification performance was assessed us- of the PhysioNet/CinC Challenge 2016 on the hidden chal-
ing the challenge score (MAcc) defined as the arithmetic lenge testing set containing 1,277 recordings was 0.812
mean of sensitivity and specificity, both modified to ac- (sensitivity: 0.848, specificity: 0.776), for binary classifi-
count for predictions of the label “unsure”, see [1, Eq. cation based on DF + SS + PSD. With this MAcc our algo-
(1) and (2)] for details. We consider both binary classi- rithm is within 5.6% of the winning team’s MAcc, ranked
fication into {“normal”, “abnormal”}, ignoring the quality 14th out of 48 competitors. In terms of running time, all

567
our entries to the challenge during the official phase used Samieinasab MR, Sameni R, Mark RG, Clifford GD. An
less than 21% of the computation quota available. open access database for the evaluation of heart sound al-
gorithms. Physiological Measurement 2016;37(11).
binary classification [2] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;
features MAcc Se Sp 521(7553):436–444.
[3] Goodfellow I, Bengio Y, Courville A. Deep learning, 2016.
DF 0.854 0.869 0.838
URL http://www.deeplearningbook.org. Book
DF + SS 0.860 0.910 0.811
in preparation for MIT Press.
DF + SS + PSD 0.870 0.908 0.832 [4] Bruna J, Mallat S. Invariant scattering convolution net-
ternary classification works. IEEE Transactions on Pattern Analysis and Machine
DF 0.845 0.844 0.847 Intelligence 2013;35(8):1872–1886.
DF + SS 0.847 0.841 0.854 [5] Mallat S. Group invariant scattering. Communications on
DF + SS + PSD 0.855 0.847 0.863 Pure and Applied Mathematics 2012;65(10):1331–1398.
[6] Wiatowski T, Bölcskei H. A mathematical theory of
Table 1. Results (MAcc: challenge score, Se: sensitivity, deep convolutional neural networks for feature extraction.
Sp: specificity) for different configurations of our method arXiv151206293 2015;.
(5-fold cross validation). [7] Huang FJ, LeCun Y. Large-scale learning with SVM and
convolutional nets for generic object categorization. In
Proc. of IEEE International Conference on Computer Vi-
4. Discussion sion and Pattern Recognition. 2006; 284–291.
[8] Wiatowski T, Tschannen M, Stanić A, Grohs P, Bölcskei H.
For binary classification, the results in Table 1 show that Discrete deep feature extraction: A theory and new archi-
a combination of deep features and summary features leads tectures. In Proc. of International Conference on Machine
to a higher MAcc than purely deep feature-based classifica- Learning. June 2016; 2149–2158.
tion. In more detail, the configurations involving summary [9] Ari S, Hembram K, Saha G. Detection of cardiac ab-
features have a slightly lower specificity and a significantly normality from PCG signal using LMS based least square
higher sensitivity than the configuration based on deep fea- svm classifier. Expert Systems with Applications 2010;
tures only, hence leading to less balance between sensi- 37(12):8019–8026.
tivity and specificity. For ternary classification, the im- [10] Patidar S, Pachori RB, Garg N. Automatic diagnosis of
provement through summary features is less pronounced septal defects based on tunable-Q wavelet transform of car-
diac sound signals. Expert Systems with Applications 2015;
than for binary classification.
42(7):3315–3326.
Perhaps surprisingly, ternary classification consistently [11] Zheng Y, Guo X, Ding X. A novel hybrid energy frac-
leads to a lower MAcc than binary classification. Possi- tion and entropy-based approach for systolic heart murmurs
ble reasons for this phenomenon could be that the subset identification. Expert Systems with Applications 2015;
of recordings with “poor” signal quality is too heteroge- 42(5):2710–2721.
neous to be reliably discriminated from “normal” and “ab- [12] Springer DB, Tarassenko L, Clifford GD. Logistic
normal” recordings using our method, or that reliable clas- regression-HSMM-based heart sound segmentation. IEEE
sification into {“normal”, “abnormal”} is sometimes pos- Transactions on Biomedical Engineering 2016;63(4):822–
sible even when a recording has “poor” signal quality. 832.
[13] Mallat S. A wavelet tour of signal processing: The sparse
way. 3rd edition. Academic Press, 2009.
5. Conclusion [14] Stoica P, Moses RL. Spectral analysis of signals. Pear-
son/Prentice Hall Upper Saddle River, NJ, 2005.
We presented and evaluated a robust method for heart [15] Herbei R, Wegkamp MH. Classification with reject option.
sound classification that combines a deep CNN-based fea- Canadian Journal of Statistics 2006;34(4):709–721.
ture extractor and a SVM. Improving the identification of [16] Platt J. Probabilistic outputs for support vector machines
recordings with poor signal quality and a more elaborate and comparisons to regularized likelihood methods. Ad-
way to incorporate summary features into the proposed vances in Large Margin Classifiers 1999;10(3):61–74.
method are interesting directions to be explored in the fu-
ture. Address for correspondence:
Michael Tschannen, Thomas Wiatowski
References
ETH Zürich, Communication Technology Laboratory
[1] Liu C, Springer D, Li Q, Moody B, Juan RA, Chorro FJ, Sternwartstrasse 7
Castells F, Roig JM, Silva I, Johnson AE, Syed Z, Schmidt CH-8092 Zürich
SE, Papadaniil CD, Hadjileontiadis L, Naseri H, Mouka- Switzerland
dem A, Dieterlen A, Brandt C, Tang H, Samieinasab M, {michaelt, withomas}@nari.ee.ethz.ch

568

S-ar putea să vă placă și