Documente Academic
Documente Profesional
Documente Cultură
1 Introduction
We present the results of a literature-based study of data mining methods used
for Parkinson’s disease management. The study was motivated by requirements
of the EU H2020 project PD manager1 , which aims to develop an innovative,
mobile-health, patient-centric platform for Parkinson’s disease management. One
part of the data mining module of this platform will include predictive data min-
ing algorithms designed to predict the changes in patients symptoms as well as
their severity. The second segment will include descriptive data mining meth-
ods, which will analyze and provide deeper insight in the patients condition by
discovering new disease patterns.
1
http://www.parkinson-manager.eu/.
c Springer International Publishing AG 2016
A. Holzinger (Ed.): ML for Health Informatics, LNAI 9605, pp. 209–220, 2016.
DOI: 10.1007/978-3-319-50478-0 10
210 D. Miljkovic et al.
Data mining algorithms search for patterns and/or models in data, which
are interesting and valid according to the user-defined criteria of interesting-
ness and validity. Both predictive as well as descriptive data mining methods
are lately rapidly used in the healthcare domain. Use of data mining methods
brings numerous advantages in healthcare, such as lowering the cost of the avail-
able medical solutions, improving the detection of disease causes and proper
identification of treatment, drug recommendation and providing support in per-
sonalized health [1].
There are several symptoms that are important for diagnosis, management
and treatment of Parkinson’s disease patients. It appears that one of the cru-
cial roles in managing Parkinson’s disease is the detection and classification of
tremor. Tremor, which is a primary symptom of the disease, is an involuntary,
rhythmical, forwards and backwards movement of a body part and is assessed in
some studies with Hidden Markov models, neural networks and different meth-
ods for time domain and spectral analysis. Besides tremor, freezing of gait (FoG)
is one of the advanced symptoms in Parkinson’s disease. Very few computational
methods have been developed so far to detect it and they can be grouped into the
following categories: analysis of electromyography signals, 3D motion analysis,
foot pressure analysis and motion signal analysis using accelerometers and gyro-
scopes. The problem of gait initiation, the transient state between standing and
walking, is studied in terms of differentiation between normal and abnormal gait
initiation. In addition to these symptoms, medical studies have shown that over
90% of people with Parkinson’s disease suffer from some form of vocal impair-
ment. The analysis of voice can be used for successfully diagnosing Parkinson’s
disease.
There were several EU projects devoted to different aspects of Parkinson’s
disease, which is the topic of this research. On the basis of the state-of-art search,
this work proposes the design and the functionality of a data mining module that
will be implemented within the mobile e-health platform for the purpose of the
PD manager project.
3 Data Mining
This section overviews the basic concepts and provides a state-of-the-art litera-
ture review of data mining methods used in PD diagnosis, prediction, analysis
and management.
Data mining algorithms search for patterns and/or models in data that are
interesting and valid according to the user-defined criteria of interestingness and
validity. There are two main machine learning approaches used in data mining
algorithms: supervised learning takes classified examples as input for training
the classification/prediction model, while unsupervised learning takes as input
unclassified examples [4]. Consequently, data mining methods can be classified
into two main categories:
– Predictive data mining methods result in models for prediction and classifica-
tion. In classification, data are class labeled and the task of a classifier is to
determine the class for a new unlabeled example. The most commonly used
predictive methods are rule and decision tree learning methods. The classi-
fication rule model consists of if-then rules in a format: if Conditions then
Class, where Conditions represent conjunctions of attribute values, and rule
consequent is a Class label. The decision tree model consists of nodes and arcs.
Non-leaf nodes in a decision tree represent a test on a particular attribute, each
arc is test output (one or more attribute values in case of nominal attributes,
or an interval in case of numeric attributes) and each leaf is a class label.
– Descriptive data mining methods are used for finding individual data pat-
terns, such as associations, clusters, etc. Association rule induction, clustering
and subgroup discovery are among the most popular descriptive data mining
methods. Association rule learning is an unsupervised learning method where
an association rule is given in a form X → Y , where X and Y are sets of items.
The goal of association rule learning is to find interesting relationships among
sets of data items. Clustering is another unsupervised learning method where
data instances are clustered into groups based on their common characteris-
tics. On the other hand, subgroup discovery is a supervised learning method,
aimed at finding descriptions of interesting population subgroups.
Data mining is used extensively in various domains, such as business man-
agement, market analytics, insurance policies, healthcare, etc. In the last decade,
data mining has made its breakthrough and since then its use in the healthcare
domain is rapidly growing. The use of data mining methods brings numerous
advantages to healthcare, such as lowering the cost of medical solutions, detec-
tion of disease causes and proper identification of treatment methods, developing
personalized health profiles, drug recommendation systems, etc. [5]. This liter-
ature review specifically focuses on data mining applications for Parkinson’s
disease, which is the topic of the next section.
212 D. Miljkovic et al.
This section presents a survey of the data mining methods used for diagnos-
ing, predicting, analyzing and managing PD. It is structured according to the
particular tasks in PD domain for which data mining algorithms were employed.
Tremor Assessment. One of the crucial roles in the management and treat-
ment of PD patients is the detection and classification of tremors. In clinical
practice, this is one of the most common movement disorders and is typically
classified using behavioral or etiological factors [6]. During early stages of PD,
tremor is mostly present at rest, while at later stages postural action tremor can
be observed. The two are hard to distinguish with automated methods using
only their base frequencies (3.5–7.5 Hz vs. 4–12 Hz) so the recognition of body
posture is also essential. Because clinical tremor assessment is most commonly
based on subjective methods such as clinical scales, handwriting and drawing
assessment [7,8], there is a great need for objective, computational methods for
the detection and quantification of tremors.
Several approaches to computational assessment of tremor have been pro-
posed. Methods such as time domain analysis [9], spectral analysis [10] and non-
linear analysis [10] have addressed the detection and quantification of tremor.
Much of the recent work is based on body fixed sensors (BFS) for long-term
monitoring of patients [11,12]. Recent work by Rigas et al. [13] on the assess-
ment of tremor activity tries to overcome the limitations of older methods using
the following approach. First, six body sensors are used to collect raw data in
real time. Then, signal pre-processing is performed to separate lower frequency
events (not interesting) and higher frequency events (relevant for tremor assess-
ment). Following data pre-processing, two sets of features are extracted from
signal data. The first set of features is used for posture recognition while the
second set is used for tremor detection and quantification. The obtained data is
used to train two Hidden Markov Models (HMM), one for posture recognition
and the other for tremor classification. By using the combined output of both
models, tremor can be accurately assessed. The authors report high accuracy
on a set of 23 subjects which indicates that the method is efficient for tremor
assessment in practice. A less popular, invasive approach to data collection for
tremor assessment is using deep brain electrodes implanted in PD patients [14].
Wu et al. have demonstrated the performance of radial basis function neural
network in detecting tremor and non-tremor patterns. These patterns allowed
them to predict the tremor onset in PD patients.
Gait Analysis. Muniz et al. [15] study the effects of deep brain stimulation
(DBS) on ground reaction force (GRF) during gait. They are interested in the
ability to discriminate between normal and PD subjects. They report using the
Principal Component Analysis (PCA) of the walking trials data to extract PC
features from the different GRF components. Then, the aim of the paper is to
investigate which classification method is the most successful for the addressed
Machine Learning and Data Mining Methods 213
discrimination task. The methods tested are Support Vector Machines (SVM),
Probabilistic Neural Networks (PNN) and Logistic Regression (LR). They con-
clude that the PNN and SVM models show the best performance.
The problem of discriminating between normal and PD gait pattern is ana-
lyzed also in a study by Tahir and Manap [16]. Beside the GRF data, the authors
use additional spatio-temporal (e.g., stride time, cadence, step length) and kine-
matic (e.g., hip, knee, ankle angle) data, acquired by using reflective markers on
the subjects skin and an infrared camera. The methods they compare are SVMs
using linear, radial basis function and polynomial kernels, and Neural Networks
(NN). Their results suggest that the kinematic features are not informative for
this task, and that the SVM classifier performs better than the NNs.
Gait Initiation. The problem of gait initiation, the transient state between
standing and walking, was addressed in the study of Muniz et al. [25]. The
authors target only the long-term effects of DBS on gait initiation, as the disease
progresses. Their hypothesis is that as the disease progresses, the gait initiation
worsens. The methods used to test this include PCA of the GRF components,
selection of the most informative PCs, and analysis of standard distance2 . The
authors also use logistic regression to identify a threshold over the standard
distance data, which would discriminate between normal and abnormal gait ini-
tiation. They conclude that the standard distance based on PCA of gait initiation
GRF data is increased on long-term.
2
Standard distance: a statistic mainly used for spatial GIS data, to measure compact-
ness of a distribution.
3
https://archive.ics.uci.edu/ml/datasets/Parkinsons+Telemonitoring.
4
https://archive.ics.uci.edu/ml/datasets/Parkinsons.
Machine Learning and Data Mining Methods 215
ciated with the best classification accuracy is obtained by the fuzzy-based non-
linear transformation method in combination with the SVM classifier [32]. Addi-
tionally, the authors claim that their 10-times 10-fold cross-validation results
suggest that FKNN shows even higher classification accuracy.
5
http://www.clowdflows.org/.
216 D. Miljkovic et al.
usually involves much technological effort (as described in [34]) which would be
better invested in improving core research.
The platform has a user-friendly graphical user interface which enables its
easy utilization by the end users (clinicians). We expect that the proposed
PD manager data mining module, implemented in the ClowdFlows platform,
will exhibit several advantages (such as efficient feature selection, and wider
selection of applicable data mining and machine learning algorithms) and will
enable data analysis in clinical practice.
5 Future Challenges
Building on the initial implementation of the data mining module, one could
take several possible research routes to improve the effectiveness of our meth-
ods. The use of a centralized, Web based graphical workflow system will prove
especially useful in those endeavors, as it bundles not only data, but the exper-
imental results of all experts involved with that platform, which can be useful
to experiment with. This follows the concept of an “expert-in-the-loop” to solve
problems which would otherwise be computationally hard [35] (for clinical exam-
ples refer to: [36–38]). This “glass-box”-approach [39] may be an alternative to
the often criticized fully-automatic “black-box”-approach.
References
1. Holzinger, A.: Trends in interactive knowledge discovery for personalized medicine:
cognitive science meets machine learning. IEEE Intell. Inf. Bull. 15, 6–14 (2014)
2. Dauer, W., Przedborski, S.: Parkinson’s disease: mechanisms and models. Neuron
39, 889–909 (2003)
3. Kranjc, J., Podpečan, V., Lavrač, N.: ClowdFlows: a cloud based scientific work-
flow platform. In: Flach, P.A., Bie, T., Cristianini, N. (eds.) ECML PKDD 2012.
LNCS (LNAI), vol. 7524, pp. 816–819. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-33486-3 54
218 D. Miljkovic et al.
4. Mladenic, D., Lavrač, N., Bohanec, M., Moyle, S. (eds.): Data Mining and Deci-
sion Support: Integration and Collaboration. The Springer International Series in
Engineering and Computer Science. Springer, Heidelberg (2003)
5. Tomar, D., Agarwal, S.: A survey on data mining approaches for healthcare. Int.
J. Bio Sci. Bio Technol. 5, 241–266 (2013)
6. Findley, L.J.: Classification of tremors. J. Clin. Neurophysiol. 13, 122–132 (1996)
7. Budzianowska, A., Honczarenko, K.: Assessment of rest tremor in parkinson’s dis-
ease. Polish J. Neurol. Neurosurg. 42, 12–21 (2008)
8. Jankovic, J.: Parkinson’s disease: clinical features and diagnosis. J. Neurol. Neu-
rosurg. Psychiatry 79, 368–376 (2008)
9. Timmer, J., Gantert, C., Deuschl, G., Honerkamp, J.: Characteristics of hand
tremor time series. Biol. Cybern. 70, 75–80 (1993)
10. Riviere, C.N., Reich, S.G., Thakor, N.V.: Adaptive fourier modelling for quantifi-
cation of tremor. J. Neurosci. Methods 74, 77–87 (1997)
11. Patel, S., Hughes, R., Huggins, N., Standaert, D., Growdon, J., Dy, J., Bonato,
P.: Using wearable sensors to predict the severity of symptoms and motor com-
plications in late stage parkinson’s disease. In: Conference Proceedings: Annual
International Conference of the IEEE Engineering in Medicine and Biology Soci-
ety, pp. 3686–3689 (2008)
12. Patel, S., Lorincz, K., Hughes, R., Huggins, N., Growdon, J., Standaert, D., Akay,
M., Dy, J., Welsh, M., Bonato, P.: Monitoring motor fluctuations in patients with
parkinson’s disease using wearable sensors. IEEE Trans. Inf. Technol. Biomed. 13,
864–873 (2009)
13. Rigas, G., Tzallas, A.T., Tsipouras, M.G., Bougia, P., Tripoliti, E., Baga, D.,
Fotiadis, D.I., Tsouli, S., Konitsiotis, S.: Assessment of tremor activity in the
parkinson’s disease using a set of wearable sensors. IEEE Trans. Inf. Technol.
Biomed. 16, 478–487 (2012)
14. Wu, D., Warwick, K., Ma, Z., Burgess, J., Pan, S., Aziz, T.: Prediction of parkin-
son’s disease tremor onset using radial basis function neural networks. Expert Syst.
Appl. 37, 2923–2928 (2010)
15. Muniz, A.M., Liu, H., Lyons, K., Pahwa, R., Liu, W., Nobre, F.F., Nadal, J.: Com-
parison among probabilistic neural network, support vector machine and logistic
regression for evaluating the effect of subthalamic stimulation in parkinson disease
on ground reaction force during gait. J. Biomech. 43, 720–726 (2010)
16. Tahir, N.M., Manap, H.H.: Parkinson disease gait classification based on machine
learning approach. J. Appl. Sci. 12, 180–185 (2012)
17. Bloem, B.R., Hausdor, J.M., Visser, J.E., Giladi, N.: Falls and freezing of gait
in parkinson’s disease: a review of two interconnected, episodic phenomena. Mov.
Disorders J. 19, 871–884 (2004)
18. Giladi, N., Tal, J., Azulay, T., Rascol, O., Brooks, D.J., Melamed, E., Oertel, W.,
Poewe, W.H., Stocchi, F., Tolosa, E.: Validation of the freezing of gait questionnaire
in patients with parkinson’s disease. Mov. Disorders J. 24, 655–661 (2009)
19. Nieuwboer, A., Dom, R., De Weerdt, W., Desloovere, K., Janssens, L., Stijn, V.:
Electromyographic profiles of gait prior to onset of freezing episodes in patients
with parkinson’s disease. Brain 127, 1650–1660 (2004)
20. Delval, A., Snijders, A.H., Weerdesteyn, V., Duysens, J.E., Defebvre, L., Giladi, N.,
Bloem, B.R.: Objective detection of subtle freezing of gait episodes in parkinson’s
disease. Mov. Disorders J. 25, 1684–1693 (2010)
21. Hausdor, J.M., Schaafsma, J.D., Balash, Y., Bartels, A.L., Gurevich, T., Giladi,
N.: Impaired regulation of stride variability in parkinson’s disease subjects with
freezing of gait. Exp. Brain Res. 149, 187–194 (2003)
Machine Learning and Data Mining Methods 219
22. Tripoliti, E.E., Tzallas, A.T., Tsipouras, M.G., Rigas, G., Bougia, P., Leontiou, M.,
Konitsiotis, S., Chondrogiorgi, M., Tsouli, S., Fotiadis, D.I.: Automatic detection
of freezing of gait events in patients with parkinson’s disease. Comput. Methods
Prog. Biomed. 110, 12–26 (2013)
23. Han, J.H., Lee, W.J., Ahn, T.B., Jeon, B.S., Park, K.S.: Gait analysis for freezing
detection in patients with movement disorder using three dimensional acceleration
system. In: Conference Proceedings: Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, vol. 2, pp. 1863–1865. Medicine and
Biology Society (2003)
24. Bächlin, M., Plotnik, M., Roggen, D., Giladi, N., Hausdor, J.M., Tröster, G.: A
wearable system to assist walking of parkinson s disease patients. Methods Inf.
Med. 49, 88–95 (2010)
25. Muniz, A.M., Nadal, J., Lyons, K., Pahwa, R., Liu, W.: Long-term evaluation
of gait initiation in six parkinson’s disease patients with bilateral subthalamic
stimulation. Gait Posture 35, 452–457 (2012)
26. Little, M.A., McSharry, P., Hunter, E.J., Spielman, J., Ramig, L.O.: Suitability
of dysphonia measurements for telemonitoring of parkinsons disease. IEEE Trans.
Biomed. Eng. 56, 1015–1022 (2009)
27. Das, R.: A comparison of multiple classification methods for diagnosis of parkinson
disease. Expert Syst. Appl. 37, 1568–1572 (2010)
28. Eskidere, O., Ertaç, F., Hanilçi, C.: A comparison of regression methods for remote
tracking of parkinsons disease progression. Expert Syst. Appl. 39, 5523–5528
(2012)
29. Chen, H.L., Huang, C.C., Yu, X.G., Xu, X., Sun, X., Wang, G., Wang, S.J.: An
efficient diagnosis system for detection of parkinsons disease using fuzzy k-nearest
neighbor approach. Expert Syst. Appl. 40, 263–271 (2013)
30. Little, M.A., McSharry, P.E., Roberts, S.J., Costello, D.A.E., Moroz, I.M.: Exploit-
ing nonlinear recurrence and fractal scaling properties for voice disorder detection.
BioMed. Eng. OnLine 6, 23 (2007)
31. Tsanas, A., Little, M., McSharry, P.E., Ramig, L.O.: Accurate telemonitoring of
parkinson’s disease progression by noninvasive speech tests. IEEE Trans. Biomed.
Eng. 57, 884–893 (2010)
32. Li, D.C., Liu, C.W., Hu, S.C.: A fuzzy-based data transformation for feature extrac-
tion to increase classification performance with small medical data sets. Artif.
Intell. Med. 52, 45–52 (2011)
33. Exarchos, T.P., Tzallas, A.T., Baga, D., Chaloglou, D., Fotiadis, D.I., et al.: Using
partial decision trees to predict parkinsons symptoms: a new approach for diagnosis
and therapy in patients suffering from parkinsons disease. Comput. Biol. Med. 42,
195204 (2012b)
34. Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary,
V., Young, M.: Machine learning: the high interest credit card of technical debt.
In: SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop)
(2014)
35. Holzinger, A.: Interactive machine learning for health informatics: when do we need
the human-in-the-loop? Brain Inf. (BRIN) 3, 119–131 (2016)
36. Hund, M., Böhm, D., Sturm, W., Sedlmair, M., Schreck, T., Ullrich, T., Keim,
D.A., Majnaric, L., Holzinger, A.: Visual analytics for concept exploration in sub-
spaces of patient groups: making sense of complex datasets with the doctor-in-the-
loop. Brain Inf. 3(4), 233–247 (2016)
220 D. Miljkovic et al.
37. Girardi, D., Küng, J., Kleiser, R., Sonnberger, M., Csillag, D., Trenkler, J.,
Holzinger, A.: Interactive knowledge discovery with the doctor-in-the-loop: a prac-
tical example of cerebral aneurysms research. Brain Inf. 3(3), 133–143 (2016)
38. Yimam, S.M., Biemann, C., Majnaric, L., Sabanovic, S., Holzinger, A.: An adaptive
annotation approach for biomedical entity and relation recognition. Brain Inf. 3,
1–12 (2016)
39. Holzinger, A., Plass, M., Holzinger, K., Crişan, G.C., Pintea, C.-M., Palade, V.:
Towards interactive machine learning (iML): applying ant colony algorithms to
solve the traveling salesman problem with the human-in-the-loop approach. In:
Buccafurri, F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-
ARES 2016. LNCS, vol. 9817, pp. 81–95. Springer, Heidelberg (2016). doi:10.1007/
978-3-319-45507-5 6
40. Sun, Y., Han, Y.: Mining Heterogeneous Information Networks: Principles and
Methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery. Mor-
gan & Claypool Publishers, San Francisco (2012)
41. Rice, J.R.: The algorithm selection problem. Adv. Comput. 15, 65–117 (1975)
42. Lemke, C., Budka, M., Gabrys, B.: Metalearning: a survey of trends and technolo-
gies. Artif. Intell. Rev. 44, 117–130 (2015)
43. Burke, E., Kendall, G., Newall, J., Hart, E., Ross, P., Schulenburg, S.: Hyper-
heuristics: an emerging direction in modern search technology. International Series
in Operations Research and Management Science, pp. 457–474 (2003)
44. Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu, R.:
Hyper-heuristics: a survey of the state of the art. J. Oper. Res. Soc. 64, 1695–1724
(2013)
45. Malle, B., Kieseberg, P., Weippl, E., Holzinger, A.: The right to be forgot-
ten: towards machine learning on perturbed knowledge bases. In: Buccafurri,
F., Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-ARES
2016. LNCS, vol. 9817, pp. 251–266. Springer, Heidelberg (2016). doi:10.1007/
978-3-319-45507-5 17