Documente Academic
Documente Profesional
Documente Cultură
net/publication/235421872
CITATIONS READS
23 479
3 authors, including:
Some of the authors of this publication are also working on these related projects:
Correlate the microstructre with mechanical property by using EBSD Technique View project
All content following this page was uploaded by Yugal Kumar on 22 May 2014.
JUNE 2011
ORIGINAL ARTICLES
Clinical efficacy of subgingivally delivered 0.5% controlled release azithromycin gel
in the management of chronic periodontitis
Prashant Tyagi, Shubhra Vaish, Vidya Dodwad ............ ............ 223
Predication of parkinson's disease using data mining methods: A comparative analysis of tree,
statistical, and support vector machine classifiers
Geeta Yadav, Yugal Kumar, Gadadhar Sahoo ............ ............ 231
The effect of sildenafil on the elasticity of erythrocytes in homozygous sickle cell disease
Abigail C. Hagley, Novie O. Younger-Coleman, Andrea A. Richards,
Chukwuemeka R. Nwokocha, Dagogo J. Pepple ............ ............ 243
In-vitro evaluation and comparison of the anti-microbial potency of commercially available
oral hygiene products against Streptococcus mutans
R. Sentila, A. Gandhimathi, S. Karthika, R. Suryalakshmi, A. Michael ............ ............ 250
Dengue epidemiology in thanjavur and Trichy district, Tamilnadu - Jan 2011-Dec 2011
Chinnathambi Kalidoss Bhuvaneswari, Ramalingam Senthil Raja, Kavita Arunagiri,
Shambasivam Mohana, Karuppanan Sathiyamurthy, Kaveri Krishnasamy,
Palani Gunasekaran ............ ............ 260
231
ORIGINAL ARTICLE
ABSTRACT
The prediction of Parkinson’s disease in early age has been challenging task among
researchers, because the symptoms of disease came into existence in middle and late
middle age. There are lots of symptoms that lead to Parkinson’s disease. But this article
focuses on the speech articulation difficulty symptoms of PD affected people and try
to formulate the model on the behalf of three data mining methods. These three data
mining methods are taken from three different domains of data mining i.e., from tree
classifier, statistical classifier, and support vector machine classifier. Performance of these
three classifiers is measured with three performance matrices i.e., accuracy, sensitivity,
and specificity. Hence, the main task of this article is tried to find out which model
identified the PD affected people more accurately.
Key words: Classifiers, decision stump, logistic regression, machine learning, Parkinson’s
and sequential minimization optimization
and coordinating the body movements. But in slowing of the digestive system, constipation,
this disease, chemical imbalance occurs in two fatigue, weakness and hypotension.[9] As the
brain chemical of live organism i.e., dopamine disease progresses, patient also faces some
and acetylcholine. These chemicals are other problems such as motor skills, speech,
responsible for controlling and coordinating gait, digestion, movement, emotion, blood
the body movements. Imbalance between pressure, balance and posture. Recent studies
these two chemicals affects the control and on Parkinson’s disease conclude that most of
coordination of body movement. Hence, due PD patients (approximately 90%) have suffered
to this chemical imbalance, central nervous speech difficulties i.e., dysphonia (impaired
system (CNS) of human being is also affected. speech production) and dysarthria (speech
A number of environmental factors have been articulation difficulties). [10] These mobility
implicated in PD.[3] However, aging is the most deficits are difficult to treat with drugs or
significant risk factor for development of the neurosurgery.[11,12]
disease. It affects both sexes (more male
prevalence). [4] With the number of people Machine learning algorithms in disease
in the United States aged above 65 years Machine learning algorithms have good history
expected to double by 2030.[5] According to[6], in disease diagnosis and prediction. A large
in India 11,747,102 number of people out of 1, number of papers have been published that
065, 070, 6072 are affected from Parkinson’s exhibited the application of machine learning
disease. These statistics are generated with algorithm in medical field such as diagnosis
help of extrapolations of various prevalence of disease, prediction of disease, survivability
or incidence rates against the populations of a and identification of disease. Initially, three
particular country or region but these statistics branches of machine learning came into view
are not fully reliable. William Dauer and Serge i.e., symbolic learning, statistical methods
Przedborski had classified the Parkinsonian and neural networks. Symbolic learning was
Syndromes i.e., Primary Parkinsonism, described by Hunt, [13] statistical methods
Secondary Parkinsonism, Parkinson‑plus described by Nilsson[14] and neural networks
Syndromes and Familial Neurodegenerative by Rosenblatt.[15] Machine learning community
Diseases. [7] Parkinson’s disease symptoms has developed large number of machine
do not occur in late middle age and vary from learning tools that have been widely used to
individual to individual. Symptoms of Parkinson obtain classification models including medical
disease begin subtly and advanced slowly. prognostic models.[16,17] For cancer diagnosis
The classical and earlier symptoms include and research, artificial neural network and
trembling in the hands, arms, legs and face, decision tree classifiers have been used
rigidity, brakykinesia, and asymmetric onset.[8] and these methods provided remarkable
Besides these classical symptoms, some results.[18,19] Pendharker applied several data
other symptoms are also found that lead to mining methods to diagnosis patterns in
Parkinson’s disease. These symptoms are breast cancer.[20] Dursun Delen et al., applied
micrographia (small almost unreadable writing), ANN, Decision Tree and Logistic Regression
decreased olfaction and postural instability, method to predict the survivability of breast
cancer patient. [21] Logistic regression and had provided evidence‑based analysis of
K‑ nearest neighbor model with six different physical therapy in Parkinson’s disease and
machine learning algorithms had used to also developed multifaceted implementation
predict the pneumonia mortality.[22] Support strategy for PD. These studies provide a
vector machines (SVMs) have been used platform of physical therapy to diagnosis
for detection and diagnosis of wide range Parkinson’s disease in future. A recent study on
of biomedical diseases such as detection rehabilitation of PD patient has been provided
of oral cancers in optical images,[23] polyps by Khan F et al.,[39] and used SPSS version 15
in CT colonography, [24] detection of micro to generate the results. This result includes
calcifications in mammograms, [25] and four parameters i.e., functional independence
analysis of gene expression measured via measure (FIM) scores and FIM efficiency,
microarrays. [26] Study of several machine hospital length of stay (LOS) and discharge
learning approaches for micro calcification destination. Hence, overall conclusion of
detection has shown that SVMs provide better this study showed that most patients with
classification performance to other approaches PD were in the higher functioning ANSNAP
such as ANNs. [27] Bayesian networks have classes (216–18) with only less than 1% in the
been applied in biomedicine, especially very disabled class (ANSNAP 219). Another
in probabilistic expert systems for clinical type of study that includes questionnaires to
diagnosis[28,29] and computational biology. [30] PD affected patient had done by AGEM de
Because Bayesian network has capability to Boer et al.[40] In this study, a list of questions
deal with biomedical data that either incomplete were sent to patient and response of the
or partially correct. [31] At present, machine patient was used to determine the quality of
learning techniques are also used for detecting life in patient with Parkinson’s disease. Two
and classifying tumors via X‑ray and CRT approaches were used to determine the validity
images,[32,33] classification of malignancies from of PDQL i.e., discriminate validity of the PDQL
proteomic and genomic assays.[34‑36] According and convergent validity of PDQL. PDQL used
to the Pub Med statistics nearly 1,800 of four parameters i.e., parkinsonian symptoms,
papers has been published on cancer by use systemic symptoms, emotional functioning,
of machine learning techniques. and social functioning. Result of this study
showed that Parkinson’s disease affected
PREVIOUS RESEARCH patient took twice times as inflammatory
bowel disease affected patient to fill the
Many researchers classified the Parkinson’s same questionnaires.[41] This study concludes
disease by several methods. Fiona O’reilly[37] that PDQL is an appropriate, validate and
et al., determined social, psychological, and useful tool to determine quality of life of
physical aspects of a person whose partner patients with Parkinson’s disease in clinical
was affected by Parkinson’s disease and studies. Patrica Limousin et al.,[42] had provided
result of his study showed that person has another method for the treatment of advanced
slightly worst social, psychological, and Parkinson’s diseases i.e., electrical stimulation
physical profiles. Samyra H.J. Keus et al.,[38] of the subthalamic nucleus. The results of
this method were tested on activities of daily tree, rep tree, J 48, decision stump, MSP, LMT
living and motor examination of the Unified and so on. But in this paper, decision stump
Parkinson’s Disease Rating Scale and this tree classifier is used to obtain desired result.
method leads to decreases symptoms of A decision stump (DS) tree defined as one
medication and levodopa dose can be reduced, node decision tree that is based on stopping
with a consequent reduction in dyskinesias. the decision tree learner after the single most
Yunfeng Wu et al.,[43] focused on the statistical informative feature.[47]
analysis of gait rhythm in patients affected with
Parkinson’s disease. Kenneth Revett et al.,[10] Logistic regression (LR)
discussed rough set approach in feature Few of statistical algorithms are linear
selection for Parkinson’s disease. They focused discriminate analysis, least mean square
on the impairment in voice production and used quadratic, kernel, logistic regression and k
rough set approach to identify the person with nearest neighbors. But in this paper, Logistic
Parkinson’s disease. regression is used to obtain desired results.
Logistic regression is statistical classifiers
PREDICATION MODELS that are used for the analysis of data. It is
a type of linear regression that is used for
In this paper, three different types of predicting binary or multi‑class‑dependent
classification methods are used i.e., decision variables.[48] Logistic regression (LR)[49] is a
stump (tree classifier), logistic regression classical classification method that has been
(statistical classifier), and sequential used widely in many applications including
minimization optimization (support vector document classification, [50,51] computer
machine). vision, [52] natural language processing [53]
and bioinformatics. [54] LR can be defined
Tree classifiers mathematically as Pr (G = k | X = x) is a
Tree is a classifier that can be defined as nonlinear function of x and range from 0 to 1
a recursive partition of the dataset. Tree and sum up to 1.
classifiers mainly consist a set of nodes in
which one of the node acts as root node; Support vector machine (SVM)
all other nodes have exactly one incoming Vapnik et al., [55] firstly used support vector
and outgoing edge known as internal nodes machines (SVM) to classification purpose.
and rest of nodes with no outgoing edges But presently, SVMs have been used in a
known as terminal nodes or leaf nodes. A tree wide range of problems including pattern
representation form can be easily converted recognition, [56] bioinformatics [57] and text
into IF‑THEN rules and used for extract categorization.[58] Hence, SVM classification
valuable information from datasets.[44] Hence, has done by realizing a linear or nonlinear
large number of decision tree algorithms had separation surface. But it can be found that
developed such as Quinlan’s ID3, C4.5, C5[44,45] training of SVM requires solving quadratic
and Breiman et al., CART.[46] A large number optimization problem.[59] A large number of
of tree classifiers had defined such as random algorithms are proposed such as the sequential
Table 1: Tabular analysis of k (10) folds cross‑validation method for all folds and models
Fold Decision stump (tree classifier) Logistic regression Sequential minimization
No. (function classifier) optimization(SVM classifier)
Confusion Acu. Sen. Spec. Confusion Acu. Sen. Spec. Confusion Acu. Sen. Spec.
matrix matrix matrix
1 142 5 0.76 0.97 0.15 100 47 0.65 0.68 0.54 142 5 0.75 0.97 0.10
41 7 22 26 43 5
2 142 5 0.75 0.90 0.13 104 43 0.69 0.71 0.65 144 3 0.75 0.98 0.06
33 15 17 31 45 3
3 142 5 0.75 0.93 0.12 96 91 0.53 0.51 0.60 142 5 0.76 0.97 0.15
38 10 19 29 41 7
4 142 5 0.75 0.90 0.13 96 51 0.65 0.65 0.65 144 3 0.78 0.98 0.17
33 15 17 31 40 8
5 142 5 0.75 0.93 0.12 99 48 0.66 0.67 0.60 142 5 0.75 0.97 0.10
38 10 19 29 43 5
6 142 5 0.75 0.90 0.13 96 52 0.64 0.65 0.63 142 5 0.76 0.97 0.13
33 15 18 30 42 6
7 142 5 0.75 0.90 0.13 93 54 0.63 0.63 0.63 144 3 0.78 0.98 0.17
33 15 18 30 40 8
8 142 5 0.75 0.90 0.13 97 50 0.65 0.66 0.60 144 3 0.76 0.98 0.10
33 15 19 29 43 5
9 142 5 0.75 0.90 0.13 97 50 0.66 0.66 0.65 144 3 0.78 0.98 0.17
33 15 17 31 40 8
10 142 5 0.75 0.90 0.13 97 50 0.66 0.66 0.65 143 4 0.76 0.97 0.13
33 15 17 31 42 6
Mean 0.75 0.91 0.13 0.64 0.64 0.62 0.76 0.9745 0.13
*Acu=→Accuracy, *Sen=→Sensitivity, *Spec=→Specificity
sensitivity of 0.64 and specificity of 0.62. Figure 3 shows the comparison of sensitivity
The model formed with help of sequential parameter which provides the similar result
minimization optimization i.e., SVM classifier to accuracy parameter. Hence the analysis
provides the accuracy of 0.76 with sensitivity of of [Figure 3] shows that the LR model
0.97 and specificity of 0.13. provides worst result i.e., 0.64 while SMO
model provides best result among discussed
Figure 2 provides the accuracy parameter classifiers i.e., 0.97.
comparison of model formed by decision stump
(DS), Logistic regression (LR) and sequential But the analysis of specificity parameter
minimization optimization (SMO) classifiers. [Figure 4] provides just opposite result to
From [Figure 2], it is clear that LR model has accuracy [Figure 2] and sensitivity [Figure 3]
poor performance among these models i.e., parameter. Because in case of specificity
0.64 (accuracy result), while the performance parameter [Figure 4], DS and SMO models
of SMO model is best among these models i.e., have similar performance i.e., 0.13 while the
0.76. Because LR model identified 64 people performance of LR model is 0.62.
out of 100 people with PD affected while in
case of SMO model, it indicates that 75 people But, the model formed with a high sensitivity,
out of 100 with PD affected. but low specificity i.e., SMO model (sensitivity
$FFXUDF\
'6 /5 602
6HQVLWLYLW\
6SHFLILFLW\
'6 /5 602
'6 /5 602
Figure 3: Sensitivity comparison of decision stump,
Figure 4: Specificity comparison of decision stump,
logistic regression and sequential minimization
logistic regression and sequential minimization
optimization classifiers
optimization classifiers
and SVM classifiers accurately identified the 2. Parkinson J. An essay on the shaking palsy.1817.
number of people with PD. But analyzing the J Neuropsychiatry Clin Neurosci 2002; 14:223‑36.
performance of the sensitivity and specificity 3. Tanner CM, Ross GW, Jewell SA, Hauser RA,
parameter, there is large difference in case Jankovic J, Factor SA. Occupation and risk of
Parkinsonism: A multicenter case‑ control study.
of tree and SVM method i.e., 0.78 and 0.84
Arch Neurol 2009;66:1106‑13.
but in case of statistical method, performance
4. Marras C, Tanner C. Epidemiology of Parkinson’s
is almost same i.e., 0.64 and 0.62. Hence,
Disease. Movement Disorders. In: Watts RL,
the model with a high sensitivity but low Koller WC, editors Neurologic Principles and
specificity results in many patients who are Practice, 2nd ed. New York: The McGraw‑Hill
disease free, may have possibility the disease Companies; 2004. p. 177.
and subject to further investigation. Hence, in 5. US Census Bureau. US interim projections by
case of tree and SVM method sensitivity (0.91 age, sex, race, and Hispanic origin: 2000‑2050.
and 0.97) is high but specificity (0.13 and Available from: http://www.census.gov/population/
0.13) is low further investigation of people is ww/projections/usinterimproj. [Last Accessed on
required but in case of LR model sensitivity 2012 Apr. 7].
6. Available from: http://www.rightdiagnosis.com/p/
and specificity is almost same i.e., 0.64
parkinsons_disease/stats‑country.htm. [Last
and 0.62 and there is no need of further
Accessed on 2012 Apr. 7].
investigation. Hence, we finally conclude that
7. Dauer W, Przedborski S. Parkinson’s disease:
LR model indentified people with PD more
Mechanisms and Models. Neuron 2003;39:
correctly than tree and SVM classifiers on the 889‑909.
behalf of discussed performance matrices. 8. Alonso JB, de Leon J, Alonso I, Ferrer MA.
So in the study of Parkinson’s disease, only Automatic detection of pathologies in the voice
voice measurements of people have been by HOS based parameters. EURASIP J Appl Sig
considered to identify the person with PD. process 2001;14:275‑84.
There are lot of symptoms that lead the 9. Cnockaert L, Schoentgen J, Auzou P, Ozsancak
Parkinson’s disease such as age factor, C, Defebvre L, Grenez F. Low‑ frequency vocal
environmental factor, trembling in the hands, modulations in vowels produced by Parkinsonian
subjects. Speec Commun 2008;50:288‑300.
arms, legs, impaired speech production and
10. Revett K, Gorunescu F, Mohamed Salem AB.
speech articulation difficulties. But in this
Feature Selection in Parkinson’s disease: A rough
paper, speech articulation difficulty of PD
Sets approach. Proceedings of the International
affected people is considered for formation of
Multi conference on Computer Science and
the model and analyzes the discussed model Information Technology; Oct. 12‑14 Margowo,
on this symptom of Parkinson’s disease. Polond 2009;4: p. 425‑8.
11. Anthony E, Lees LA. Management of Parkinson’s
REFERENCES disease: An evidence‑based review. Mov Disord
2002; 17(Suppl 4):S1‑166.
1. Elbaz A, Bower JH, Peterson BJ, Maraganore DM, 12. B l o e m B R , B e c k l e y D J , v a n D i j k J G ,
McDonnell SK, Ahlskog JE, et al. Survival Zwinderman AH, Remler MP, Roos RA. Influence
Study of Parkinson Disease in Olmsted County, of dopaminergic medication on automatic postural
Minnesota. Arch Neurol 2003;60:91‑6. responses and balance impairment in Parkinson’s
2004;37:249‑59. 1993;11:63‑91.
37. O’Reilly F, Finnan F, Allwright S, Smith GD, 48. Hastie T, Tibshirani R, Friedman J. The
Ben‑Shlomo Y. The effects of caring for a elements of statistical learning. New York, NY:
spouse with Parkinson’s disease on social, Springer‑Verlag; 2001.
psychological and physical well‑being. Br J Gen 49. Minka TP. A comparison of numerical optimizers
Pract 1996;46:507‑12. for logistic regression. Technical report,
38. K e u s S H , B l o e m B R , H e n d r i k s E J , Technical Report, Department of Statistics,
Bredero‑Cohen AB, Munneke M. Practice Carnegie Mellon University, Pittsburgh, PA, USA,
Recommendations Development Group. 2003(revised2007).
Evidence‑Based Analysis of Physical Therapy 50. Friedman J, Hastie T, Tibshirani R. Additive
in Parkinson’s Disease with Recommendations logistic regression: A statistical view of boosting.
for Practice and Research. Mov Disord Ann Statis 2000;28:337‑407.
2007;22:451‑60. 51. Maghbouleh A. A logistic regression model
39. Khan F, Amatya B. Rehabilitation for Parkinson’s for detecting prominences. In The Fourth
disease: Analysis of the Australian rehabilitation International Conference on Spoken Language,
outcomes dataset. J Clin Med Res 2010;3:1‑8.
ACM, Oct. 3 rd - 6 th, Philadelphia, PA, USA,
40. De Boer AG, Wijker W, Speelman JD, De Haes JC.
1996;4:2443‑5.
Quality of life in patients with Parkinson’s disease:
52. Liao JG, Chin K. Logistic regression for disease
Development of a questionnaire. J Neurol
classification using microarray data. Bioinformatics
Neurosurg Psychiatry 1996;61:70‑4.
2007;23:1945‑51.
41. De Boer AG, Wijker W, Bartelsman JF,
53. Zhu J, Hastie T. Kernel logistic regression and
de Haes HC. Inflammatory bowel disease
the import vector machine. J Comput Graph Stati
questionnaire: Crosscultural adaptation and
2001; 14:1081‑88.
further validation. Euro J Gastroentero Hepatol
54. Zhu J, Hastie T. Classification of gene microarrays
1995;7:1043‑550.
by penalized logistic regression. Biostatistics
42. Limousin P, Karck P, Pollak P, Benazzouz A,
2004;5:427‑43.
Ardouin C, Hoffmann D, et al. Electrical
55. Vapnik V N. The nature of statistical learning
stimulation of the subthalamic nucleus in
theory. New York: Springer‑Verlag; 1995.
advanced parkinson’s disease. N Eng J Med
1998;339:1105‑11. 56. Pontil M, Verri A. Support vector machines for
43. Yunfeng Wu, Krishna S. Statistical analysis of gait 3D object recognition. IEEE Trans Patt Anal Mac
rhythm in patients with Parkinson’s Disease. IEEE Intell 1998;20:637‑46.
Tran Neural Sys Rehabil Engin 2007;18:150‑18. 57. Yu GX, Ostrouchov G, Geist A, Samatova NF.
44. Quinlan JR. C4.5: Programs for Machine Learning. An SVM‑based algorithm for identification of
San Francisco, USA: Morgan Kaufmann 1995. photosynthesis‑specific genome features.
45. Quinlan J. Induction of decision trees. Mach Learn In proceeding of 2 nd IEEE computer society
1986;1:81‑106. bioinformatics conference, IEEE, 11‑ 14 August
46. Breiman L, Friedman JH, Olshen RA, Stone CJ. Stanford CA, USA 2003; 2:235‑43.
Classification and regression trees. Monterey, 58. Joachim’s T. Text categorization with support
CA: Wadsworth and Brooks/Cole Advanced vector machines. In Proceedings of 10th European
Books and Software; 1984. conference on machine learning (ECML),
47. Holte R. Very simple classification rules perform Springer, 21st‑23rd April, Chemintz, Germany
well on most commonly used dataset. Mac learn 1998. p. 137‑ 42.
59. Vishwanathan SVM, Murty, Narasimha M. SSVM: Addison‑Wesley, Reading, MA, 1968. p. 80‑230.
A Simple SVM Algorithm. In: Proceedings of 65. Breiman J, Friedman R, Stone OC. Classification
the International Joint Conference on Neural and Regression Trees. Belmont CA: Wadsworth,
Network, IEEE, 12-17 May Honolulu, Hawaii 1984.
2002:3:2393‑8. 66. Breiman L, Spector P. Sub model selection and
60. Platt JC. Advances in Kernel Methods: Support evaluation in regression. The X‑random case.
Vector Machines. In: Schölkopf B, Burges C, Internat Statist Rev 1992;60:291‑319.
Smola A, editors. Fast training of support vector 67. Schaffer C. Selecting a classification method by
machines using sequential minimal optimization. cross‑validation. Mac Learn 1993;13:135‑43.
Cambridge, MA: MIT; 1998, In Press. 68. Kohavi R. A study of cross‑validation and bootstrap
61. Keerthi SS, Shevade SK, Bhattacharyya C, for accuracy estimation and model selection. In
Murthy KK. A fast iterative nearest point algorithm Proceedings of 14th International Joint Conference
for support vector machine classifier design. IEEE on AI, Morgan Kaufmann 20‑25 August Canada
Trans Neural Netw 2000;11:12436. 1995;2:1137‑45. Available from: http://citeseer.
62. Saltelle A, Chan K, Scott EM. Sensitivity Analysis: ist.psu.edu/kohavi95study. [Last accessed on
Gauging the Worth of Scientific Models. John
2012 Apr 4].
Wiley and Sons; Edition 1: 2000; p. 492.
63. Larson S. The shrinkage of the coefficient
How to cite this article: Yadav G, Kumar Y, Sahoo G. Predication
of multiple correlations. J Educat Psychol of Parkinson’s disease using data mining methods: A comparative
analysis of tree, statistical, and support vector machine classifiers.
1931;22:45‑55. Indian J Med Sci 2011;65:231-42.
64. Mosteller F, Turkey JW. Data analysis, including Source of Support: There is no financial source of support to
carry out the work.. Conflict of Interest: In this paper, there is no
statistics. In: G. Lindzey, E. Aronson, editors. conflict of interest. I hereby take full responsibility for the veracity
of the information provided on behalf of all coauthors.
Handbook of Social Psychology. 2 nd ed.