Documente Academic
Documente Profesional
Documente Cultură
Michael Tschannen, Thomas Kramer, Gian Marti, Matthias Heinzmann, Thomas Wiatowski
dition to deep features is that classification based on deep The final feature vector describing f is obtained by col-
features and majority voting alone may not sufficiently ac- lecting (in a single feature vector) (i) every feature map
count for information that is spread over the entire recor- fnd , 1 ≤ d ≤ D, 1 ≤ n ≤ J d , generated in the network,
ding such as, e.g., heart rate variability. The effect of sum- (ii) low-pass filtered versions of the feature maps fnd , and
mary features on the classification performance is numeri- (iii) a low-pass filtered version of the signal f itself. Fig-
cally studied in Section 3. ure 3 shows an example feature vector of a cardiac cycle
In the following, we describe all parts of our method in for a network of depth D = 3 employing J = 3 wavelet
detail and discuss its evaluation and parameter selection. scales, the network parameters used for the experiments in
Segmentation: We use the heart sound segmentation al- Section 3.
gorithm from [12], which leverages a hidden semi-Markov 2
model and Viterbi decoding to segment the test heart sound P3 ρ3 f22 ∗ ψj2 P3 ρ3 f28 ∗ ψj2
recording into the four heart sound states S1 (first heart
sound), systole, S2 (second heart sound), and diastole.
P2 ρ2 f11 ∗ ψj1 P2 ρ2 f91 ∗ ψj3
Deep features: We employ the tree-like CNN-based
feature extractor proposed in [8], which we briefly review
in the following. Every layer of the network—specified
by the layer index 1 ≤ d ≤ D—is associated with a P1 ρ1 f10 ∗ ψj1 P1 ρ1 f10 ∗ ψj3
collection of pre-specified Haar wavelet filters2 {ψj }Jj=1
f = f10
[13], a pointwise Lipschitz-continuous non-linearity ρd ,
and a Lipschitz-continuous pooling operator Pd . Convolu- Figure 2. Tree-like deep CNN (of depth D = 3 employing
tions with wavelet filters, besides allowing for an efficient J = 3 wavelet scales) underlying the feature extractor de-
implementation using the algorithme à trous [13, Sec. scribed in Section 2. The root of the network corresponds
5.2.2], resolve characteristics of a signal at multiple scales to d = 0. The signal fnd , defined in (1), corresponds to the
1 ≤ j ≤ J (respectively, signal characteristics that corre- n-th feature map in the d-th network layer.
spond to dyadic frequency bands [−2−(j−1) , −2−(j+1) ] ∪
[2−(j+1) , 2−(j−1) ]), the application of a pointwise non- Before the cardiac cycles are fed into the feature extrac-
linearity ρd activates or de-activates features, and the appli- tion network, they are re-sampled to a length of 1024 to
cation of a pooling operator Pd reduces the signal dimen- ensure that they are all mapped to feature vectors of equal
sion and renders the features robust w.r.t. non-linear defor- dimension. Furthermore, each cardiac cycle is normalized
mations and translations. Here, for all layers 1 ≤ d ≤ D, by mean subtraction and division by its standard devia-
we use the rectified linear unit (ReLU) non-linearity and tion. For the described network parameters the dimension
the max-pooling operator (see, e.g., [8, Sec. 2.2, 2.3] for of the feature vectors is 12,160. To reduce computational
definitions). Every layer of the network computes a set of complexity (in particular during training) the dimension of
d
so-called feature maps {fnd }Jn=1 according to the feature vectors is reduced to 400 by principal compo-
nent analysis. Furthermore, the durations of the four heart
fnd := f(k,j)
d
:= Pd ρd fkd−1 ∗ ψj , (1) sound states are appended to the (dimensionality-reduced)
feature vectors as additional features.
where 1 ≤ k ≤ J d−1 , 1 ≤ j ≤ J, f10 := f is the input
Preliminary experiments showed that the modulus non-
signal (here, a cardiac cycle) fed into the network, and ∗
linearity and pooling by sub-sampling leads to marginally
denotes the circular convolution operator. The underlying
worse classification performance than the ReLU non-
tree-like network architecture is illustrated in Figure 2.
linearity combined with max-pooling. These experiments
2 The networks considered in [8] allow for general frame filters. also revealed that increasing the number of principal com-
566
2 labels, and ternary classification into {“normal”, “abnor-
1 mal”, “unsure”}, for which all the recordings with “poor”
0 signal quality were labeled “unsure”. The parameter of
−1 the RBF kernel and the regularization parameter of the
0 1 2 3 L2 -SVM were selected using 5-fold stratified (by patient)
layer index cross-validation. Class-adaptive sample weights were used
to compensate for the class imbalance in the data set.
Figure 3. Example feature vector generated by the deep Note that for ternary classification the sample weights were
CNN defined in Section 2. The 0-th layer corresponds to computed based on the labels “normal”/“abnormal” only
the low-pass filtered input cardiac cycle. as inclusion of the label “unsure” in the weight computa-
tion reduced the MAcc.
ponents or the depth D of the feature extraction network Modification for unsupervised ternary classification:
does not significantly improve classification performance. We briefly outline a simple modification (not used to ob-
tain the results in Section 3) of our method to learn a
Finally, an alternative to extract deep features from the ternary classifier based on the labels {“normal”, “abnor-
1-D cardiac cycles is to compute a 2-D time-frequency re- mal”} only. Specifically, this extension implements a so-
presentation (e.g., a spectrogram) of the cardiac cycles and
called reject option [15]. Assuming an estimate P̂ (Y |r)
feeding them into our feature extraction network equipped
of the posterior probability P (Y |r) of the label Y ∈
with 2-D Haar wavelet filters (and, of course, 2-D pooling
{“normal”, “abnormal”} given the test recording r to be
operators). Intuitively, such an approach might lead to a
richer feature representation allowing for better discrimi- available, a ternary prediction Ŷter is obtained as
⎧
nation between normal and abnormal heart sounds. How- ⎪
⎨“normal”, if P̂ (Y = “abnormal”|r) < τ
ever, preliminary experiments showed that this approach
Ŷter = “abnormal”, if P̂ (Y = “abnormal”|r) > 1 − τ
does not improve classification performance. ⎪
⎩
“unsure”, otherwise ,
Summary features (state statistics and PSD): We rely
on the 20 features described in [1, Sec. 6.2] consisting where τ ∈ (0, 1/2] is a threshold parameter. Under certain
of first and second order statistics of amplitudes and dura- (not necessarily realistic) model assumptions one can mo-
tions associated with the four heart sound states obtained tivate the estimation of P̂ (Y |r) according to P̂ (Y |r) :=
through segmentation. We refer to this set of features as
L
(1/L) =1 P̂ (Y |b ), where {b }L =1 are the cardiac cy-
state statistics, see also Figure 1. In addition, we use cles in the test recording r. With the heart sound classi-
a power spectral density (PSD) estimate of length 128 fication method described above, the posterior probability
(covering the spectral band 0-500Hz) computed from the estimates P̂ (Y |b ) can be obtained either from the SVM
raw (unsegmented) heart sound recording using the Welch model using Platt scaling [16], or by replacing the SVM
method [14, Sec. 2.7.2] with half-overlapping Hamming model with a logistic regression model. The threshold pa-
windows. The PSD estimate provides a compact descrip- rameter τ can be optimized using cross-validation. If the
tion of the second order statistics of the heart sound recor- score used to assess the performance of the classifier does
ding and may improve the robustness of the classification not sufficiently reward the label “unsure”, τ = 0.5 will be
when the segmentation is inaccurate. selected, which amounts to binary classification.
Evaluation and parameter selection: We evaluated
the proposed method on the publicly available Phy- 3. Results
sioNet/CinC Challenge 2016 data set containing 3,153
heart sound recordings of 764 subjects, including both Table 1 shows the 5-fold cross validation MAcc for bi-
healthy individuals and patients with different heart di- nary and ternary classification. To study the effect of dif-
seases. Each recording has two labels, the first of which ferent features we report the performance for classification
indicates whether the subject is healthy (“normal”) or was based on deep features only (DF), deep features and state
diagnosed with a cardiac disease (“abnormal”), and the statistics (DF + SS), as well as deep features and all sum-
second indicating the signal quality (“good”/“poor”). We mary features (DF + SS + PSD).
refer the reader to [1] for a detailed description of the The highest MAcc we obtained during the official phase
data set. The classification performance was assessed us- of the PhysioNet/CinC Challenge 2016 on the hidden chal-
ing the challenge score (MAcc) defined as the arithmetic lenge testing set containing 1,277 recordings was 0.812
mean of sensitivity and specificity, both modified to ac- (sensitivity: 0.848, specificity: 0.776), for binary classifi-
count for predictions of the label “unsure”, see [1, Eq. cation based on DF + SS + PSD. With this MAcc our algo-
(1) and (2)] for details. We consider both binary classi- rithm is within 5.6% of the winning team’s MAcc, ranked
fication into {“normal”, “abnormal”}, ignoring the quality 14th out of 48 competitors. In terms of running time, all
567
our entries to the challenge during the official phase used Samieinasab MR, Sameni R, Mark RG, Clifford GD. An
less than 21% of the computation quota available. open access database for the evaluation of heart sound al-
gorithms. Physiological Measurement 2016;37(11).
binary classification [2] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;
features MAcc Se Sp 521(7553):436–444.
[3] Goodfellow I, Bengio Y, Courville A. Deep learning, 2016.
DF 0.854 0.869 0.838
URL http://www.deeplearningbook.org. Book
DF + SS 0.860 0.910 0.811
in preparation for MIT Press.
DF + SS + PSD 0.870 0.908 0.832 [4] Bruna J, Mallat S. Invariant scattering convolution net-
ternary classification works. IEEE Transactions on Pattern Analysis and Machine
DF 0.845 0.844 0.847 Intelligence 2013;35(8):1872–1886.
DF + SS 0.847 0.841 0.854 [5] Mallat S. Group invariant scattering. Communications on
DF + SS + PSD 0.855 0.847 0.863 Pure and Applied Mathematics 2012;65(10):1331–1398.
[6] Wiatowski T, Bölcskei H. A mathematical theory of
Table 1. Results (MAcc: challenge score, Se: sensitivity, deep convolutional neural networks for feature extraction.
Sp: specificity) for different configurations of our method arXiv151206293 2015;.
(5-fold cross validation). [7] Huang FJ, LeCun Y. Large-scale learning with SVM and
convolutional nets for generic object categorization. In
Proc. of IEEE International Conference on Computer Vi-
4. Discussion sion and Pattern Recognition. 2006; 284–291.
[8] Wiatowski T, Tschannen M, Stanić A, Grohs P, Bölcskei H.
For binary classification, the results in Table 1 show that Discrete deep feature extraction: A theory and new archi-
a combination of deep features and summary features leads tectures. In Proc. of International Conference on Machine
to a higher MAcc than purely deep feature-based classifica- Learning. June 2016; 2149–2158.
tion. In more detail, the configurations involving summary [9] Ari S, Hembram K, Saha G. Detection of cardiac ab-
features have a slightly lower specificity and a significantly normality from PCG signal using LMS based least square
higher sensitivity than the configuration based on deep fea- svm classifier. Expert Systems with Applications 2010;
tures only, hence leading to less balance between sensi- 37(12):8019–8026.
tivity and specificity. For ternary classification, the im- [10] Patidar S, Pachori RB, Garg N. Automatic diagnosis of
provement through summary features is less pronounced septal defects based on tunable-Q wavelet transform of car-
diac sound signals. Expert Systems with Applications 2015;
than for binary classification.
42(7):3315–3326.
Perhaps surprisingly, ternary classification consistently [11] Zheng Y, Guo X, Ding X. A novel hybrid energy frac-
leads to a lower MAcc than binary classification. Possi- tion and entropy-based approach for systolic heart murmurs
ble reasons for this phenomenon could be that the subset identification. Expert Systems with Applications 2015;
of recordings with “poor” signal quality is too heteroge- 42(5):2710–2721.
neous to be reliably discriminated from “normal” and “ab- [12] Springer DB, Tarassenko L, Clifford GD. Logistic
normal” recordings using our method, or that reliable clas- regression-HSMM-based heart sound segmentation. IEEE
sification into {“normal”, “abnormal”} is sometimes pos- Transactions on Biomedical Engineering 2016;63(4):822–
sible even when a recording has “poor” signal quality. 832.
[13] Mallat S. A wavelet tour of signal processing: The sparse
way. 3rd edition. Academic Press, 2009.
5. Conclusion [14] Stoica P, Moses RL. Spectral analysis of signals. Pear-
son/Prentice Hall Upper Saddle River, NJ, 2005.
We presented and evaluated a robust method for heart [15] Herbei R, Wegkamp MH. Classification with reject option.
sound classification that combines a deep CNN-based fea- Canadian Journal of Statistics 2006;34(4):709–721.
ture extractor and a SVM. Improving the identification of [16] Platt J. Probabilistic outputs for support vector machines
recordings with poor signal quality and a more elaborate and comparisons to regularized likelihood methods. Ad-
way to incorporate summary features into the proposed vances in Large Margin Classifiers 1999;10(3):61–74.
method are interesting directions to be explored in the fu-
ture. Address for correspondence:
Michael Tschannen, Thomas Wiatowski
References
ETH Zürich, Communication Technology Laboratory
[1] Liu C, Springer D, Li Q, Moody B, Juan RA, Chorro FJ, Sternwartstrasse 7
Castells F, Roig JM, Silva I, Johnson AE, Syed Z, Schmidt CH-8092 Zürich
SE, Papadaniil CD, Hadjileontiadis L, Naseri H, Mouka- Switzerland
dem A, Dieterlen A, Brandt C, Tang H, Samieinasab M, {michaelt, withomas}@nari.ee.ethz.ch
568