Documente Academic
Documente Profesional
Documente Cultură
1 Introduction
The wide availability of computer aided systems for medicine is evident but an accu-
rate and well-organized diagnosis is still considered to be an art. To look inside a
patient for finding principal causes of a disease is impossible as human body func-
tions in an intricate way. Generally, diagnosis is being conducted on the basis of
symptoms present and by analyzing the history of patient lifestyle. Assessment of
patients needs proficient and experienced physicians in dealing their multifaceted
cases. Solving these cases is not easy as the dimensions of medical science have been
extremely expanding. Development of powerful diagnostic frameworks using classifi-
A healthy liver leads to healthy life. Liver performs various metabolic functions
include filtration of blood, detoxification of chemicals, metabolizing drugs and bile
secretion [2]. It also assists in digestion, absorption and processing of food. Improper
working of any of the liver function leads to liver disease. Factors that can cause liver
damage are heavy long term alcohol consumption, accumulation of excess amount of
body fat, high salt intake, and overuse of medications. General symptoms of liver
disease are queasiness, appetite loss, sleepiness, and chronic weakness. While the
disease progresses, severe symptoms may include jaundice, hemorrhage, inflamed
abdomen and decline in mental abilities. The most common category of liver disease
are alcoholic liver disease and nonalcoholic fatty liver disease [1–4].
Literature study proves the wide usage of classification techniques and methods in
liver disease diagnosis. Artificial neural network (ANN), decision trees, fuzzy logic
(FL), rule-based reasoning (RBR), case-based reasoning (CBR), support vector ma-
chine (SVM), genetic algorithm (GA), artificial immune system (AIS) and particle
swarm optimization (PSO) are deployed individually or in integration. General liver
disorder was classified using ANN based approach [5,6], hepatitis disease was diag-
nosed using simulated annealing and SVM [7], and using feed-forward neural net-
work [8]. SVM was also implemented to classify primary biliary cirrhosis [9]. ANN
was also being used to predict liver fibrosis severity in subjects with HCV [10] and to
assess complexity level in hepatitis examination samples [11]. CMAC neural network
was deployed for classifying hepatitis B, hepatitis C and cirrhosis [12]. C5.0 decision
tree and AdaBoost were used to categorize chronic hepatitis B and C [13]. C4.5 deci-
sion tree was used to examine liver cirrhosis [14]. Fuzzy logics were used for hepatitis
diagnosis [15], general liver disorder diagnosis [16–18], and for classifying liver dis-
ease as alcoholic liver damage, primary hepatoma, liver cirrhosis and cholelithiasis
[19]. ANN-CBR integration was used to test hepatitis in patients [2]. AN-FL was used
to deal with class imbalance problem and to enhance classification accuracy of liver
patient data [20]. AIS-FL was used to evaluate prediction accuracy of liver disorder in
patients [21]. ANN-GA was used to classify liver fibrosis [22]. ANN-PSO and ANN-
CBR-RBR hybridization were proposed for hepatitis diagnosis [23,24].
Mortality rates are rapidly increasing in liver disease cases. This indicates the need
of computer-aided systems for assessing liver patients. The study accordingly pro-
posed a correlation distance metric and nearest rule based KNN prediction model for
learning, analysis and diagnosis of liver disease. The model is more liable and accu-
rate in comparison with other traditional classifiers. Attained results of presented
model are compared with other classifiers include LDA, DLDA, QDA, DQDA and
LSSVM using statistical parameters. These parameters are accuracy, sensitivity, spe-
cificity, PPV and NPV rates. It also shows the capability to act as specialist assistant
Diagnosis of Liver Disease Using Correlation Distance Metric … 847
The rest of the paper is organized as follows. Section 2 presents the proposed intel-
ligent system and the other classifiers implemented. Section 3 describes with achieved
simulation results. Finally, section 4 concludes the study.
2 Methodologies
Primarily physicians play a vital role in taking final decision on a patient assessment
and treatment. Nevertheless, applicability of classification algorithms boost the pre-
diction rate accuracy and also acts as a second opinion on substantiating the sickness.
Consequently, this section presents the description of classification algorithms dep-
loyed in the study for liver disease diagnosis. These classifiers include LDA, DLDA,
QDA, DQDA, LSSVM and correlation distance metric and nearest rule based KNN
approach which are introduced as follows.
LDA is widely used for categorization and dimensionality reduction of data. It per-
forms efficiently for unequal frequencies in within-class matrices. It works on the
concept of maximum separability by capitalize on the ratio of between-class variance
and within-class variance. It draws decision region between available classes in data
which actually helps in understanding the feature data distribution [25,26]. For exam-
ple, a given dataset have A classes; ߤܾ is mean vector of class b where b=1, 2, . . A;
ܾܺ is total samples within class b where b=1, 2, .. A.
ܣ
ܺ ൌ ܾܺ ሺͳሻ
ܾൌͲ
ܣ ܺܿ
ܣ
tation of discriminant functions [27]. Unlike LDA which draw an average from all
three classes, QDA uses unique covariance matrices. It takes mean and sigma as a
parameter for the available class. The covariance metric which is unique for each
class is represented by sigma. On contrary, DQDA is a modification to QDA in which
off diagonal elements are positioned to zero for each class covariance metric. It fits in
the family of naive bayes classifier that follows multivariate normality. The given
class should have more than two observations as the variance of a feature cannot be
approximated with less than two.
ͳ
ܾത ݒൌ ܽ ݆ݒሺሻ
ݍ ݆
Diagnosis of Liver Disease Using Correlation Distance Metric … 849
The liver health examination data used in this study has the objective of improving
the ability of diagnosing liver disease based on features collected. This dataset is
available with university of california machine learning repository. Samples in this
collection are 583 and each sample consists of 11 features as entrance parameters out
which 10 are contributing as inputs and one is acting as a target for determining a
person as sick or healthy individual. These features include age (patient’s age), gender
(patient’s gender), TB (total bilirubin), DB (direct bilirubin), ALP (alkaline phospha-
tase), SGPT (alamine aminotransferase), SGOT (aspartate aminotransferase), TP (to-
tal proteins), ALB (albumin), A/G ratio (albumin and globulin ratio) and Selector
(field used to split data into two sets - sick or healthy. Data was divided into training
and testing part using 10-fold cross validation method. Diagnostic results of classifi-
cation algorithms were compared using statistical parameters which are defined in Eq.
(8), (9), (10), (11) and (12) respectively.
ܶܲ ܶܰ
ݕܿܽݎݑܿܿܣൌ ሺͺሻ
ܶܲ ܶܰ ܲܨ ܰܨ
ܶܲ
ܵ݁݊ ݕݐ݅ݒ݅ݐ݅ݏൌ ሺͻሻ
ܶܲ ܰܨ
ܶܰ
ܵ ݕݐ݂݅ܿ݅݅ܿ݁ൌ ሺͳͲሻ
ܶܰ ܲܨ
ܶܲ
ܸܲܲ ൌ ሺͳͳሻ
ܶܲ ܲܨ
ܶܰ
ܸܰܲ ൌ ሺͳʹሻ
ܶܰ ܰܨ
where TN designates true negative (normal people rightly identified as normal), TP is
true positive (diseased people rightly identified as diseased), FN is false negative
(diseased people wrongly identified as normal), and FP expresses false positive (nor-
mal people incorrectly identified as diseased).
Classification algorithms deployed were LDA, DLDA, QDA, DQDA and KNN
based approaches. These KNN approaches include KNN with euclidean, cityblock,
cosine, and correlation distance matrices. Each mentioned KNN variant was imple-
mented using nearest, random and consensus rules for classifying the cases. Fig. 1, 2,
3, 4, and 5 illustrates the comparison among KNN based approaches using statistical
parameters mentioned in eq. (8), (9), (10), (11) and (12). KNN with euclidean dis-
tance metric and nearest rule had 100% (training) and 96.05% (testing) accuracy,
KNN with euclidean distance metric and random rule had 100% (training) and
95.54% (testing) accuracy, KNN with euclidean distance metric and consensus rule
had 100% (training) and 95.37% (testing) accuracy, KNN with cityblock distance
metric and nearest rule had 100% (training) and 95.88% (testing) accuracy, KNN with
850 A. Singh and B. Pandey
cityblock distance metric and random rule had 100% (training) and 96.23% (testing)
accuracy, KNN with cityblock distance metric and consensus rule had 100% (train-
ing) and 95.88% (testing) accuracy, KNN with cosine distance metric and nearest rule
had 100% (training) and 96.23% (testing) accuracy, KNN with cosine distance metric
and random rule had 100% (training) and 95.57% (testing) accuracy, KNN with co-
sine distance metric and consensus rule had 100% (training) and 95.88% (testing)
accuracy, KNN with correlation distance metric and nearest rule had 100% (training)
and 96.74% (testing) accuracy, KNN with correlation distance metric and random rule
had 100% (training) and 96.23% (testing) accuracy and KNN with correlation dis-
tance metric and consensus rule had 100% (training) and 96.05% (testing) accuracy.
KNN with euclidean distance metric and nearest rule had 100% (training) and
91.62% (testing) sensitivity, KNN with euclidean distance metric and random rule
had 100% (training) and 93.41% (testing) sensitivity, KNN with euclidean distance
metric and consensus rule had 100% (training) and 93.41% (testing) sensitivity, KNN
with cityblock distance metric and nearest rule had 100% (training) and 94.01% (test-
ing) sensitivity, KNN with cityblock distance metric and random rule had 100%
(training) and 95.81% (testing) sensitivity, KNN with cityblock distance metric and
consensus rule had 100% (training) and 94.01% (testing) sensitivity, KNN with co-
sine distance metric and nearest rule had 100% (training) and 92.81% (testing) sensi-
tivity, KNN with cosine distance metric and random rule had 100% (training) and
90.79% (testing) sensitivity, KNN with cosine distance metric and consensus rule had
100% (training) and 90.42% (testing) sensitivity, KNN with correlation distance me-
tric and nearest rule had 100% (training) and 95.81% (testing) sensitivity, KNN with
correlation distance metric and nearest rule had 100% (training) and 94.61% (testing)
sensitivity and KNN with correlation distance metric and nearest rule had 100%
(training) and 92.22% (testing) sensitivity.
KNN with euclidean distance metric and nearest rule had 100% (training) and
97.84% (testing) specificity, KNN with euclidean distance metric and random rule
had 100% (training) and 96.39% (testing) specificity, KNN with euclidean distance
metric and consensus rule had 100% (training) and 96.15% (testing) specificity, KNN
with cityblock distance metric and nearest rule had 100% (training) and 96.63% (test-
ing) specificity, KNN with cityblock distance metric and random rule had 100%
(training) and 96.39% (testing) specificity, KNN with cityblock distance metric and
consensus rule had 100% (training) and 96.63% (testing) specificity, KNN with co-
sine distance metric and nearest rule had 100% (training) and 97.6% (testing) speci-
ficity, KNN with cosine distance metric and random rule had 100% (training) and
97.51% (testing) specificity, KNN with cosine distance metric and consensus rule had
100% (training) and 98.08% (testing) specificity, KNN with correlation distance me-
tric and nearest rule had 100% (training) and 97.12% (testing) specificity, KNN with
correlation distance metric and random rule had 100% (training) and 96.88% (testing)
specificity and KNN with correlation distance metric and consensus rule had 100%
(training) and 97.6% (testing) specificity.
Diagnosis of Liver Disease Using Correlation Distance Metric … 851
KNN with euclidean distance metric and nearest rule had 100% (training) and
94.44% (testing) PPV, KNN with euclidean distance metric and random rule had
100% (training) and 91.23% (testing) PPV, KNN with euclidean distance metric and
consensus rule had 100% (training) and 90.07% (testing) PPV, KNN with cityblock
distance metric and nearest rule had 100% (training) and 91.81% (testing) PPV, KNN
with cityblock distance metric and random rule had 100% (training) and 91.43% (test-
ing) PPV, KNN with cityblock distance metric and consensus rule had 100% (train-
ing) and 91.81% (testing) PPV, KNN with cosine distance metric and nearest rule had
100% (training) and 93.94% (testing) PPV, KNN with cosine distance metric and
random rule had 100% (training) and 93.66% (testing) PPV, KNN with cosine dis-
tance metric and consensus rule had 100% (training) and 94.97% (testing) PPV, KNN
with correlation distance metric and nearest rule had 100% (training) and 93.02%
(testing) PPV, KNN with correlation distance metric and random rule had 100%
(training) and 92.4% (testing) PPV, and KNN with correlation distance metric and
consensus rule had 100% (training) and 93.9% (testing) PPV.
KNN with euclidean distance metric and nearest rule had 100% (training) and
96.67% (testing) NPV, KNN with euclidean distance metric and random rule had
100% (training) and 97.33% (testing) NPV, KNN with euclidean distance metric and
consensus rule had 100% (training) and 97.32% (testing) NPV, KNN with cityblock
distance metric and nearest rule had 100% (training) and 97.57% (testing) NPV, KNN
with cityblock distance metric and random rule had 100% (training) and 98.28% (test-
ing) NPV, KNN with cityblock distance metric and consensus rule had 100% (train-
ing) and 97.57% (testing) NPV, KNN with cosine distance metric and nearest rule had
100% (training) and 97.13% (testing) NPV, KNN with cosine distance metric and
random rule had 100% (training) and 96.31% (testing) NPV, KNN with cosine dis-
tance metric and consensus rule had 100% (training) and 96.23% (testing) NPV, KNN
with correlation distance metric and nearest rule had 100% (training) and 98.3% (test-
ing) NPV, KNN with correlation distance metric and random rule had 100% (training)
and 97.82% (testing) NPV, and KNN with correlation distance metric and consensus
rule had 100% (training) and 96.9% (testing) NPV. Finally, KNN with correlation
distance metric and nearest rule was found superior to other KNN based approaches
in diagnosing liver disease.
To select the best predictive model for liver disease diagnosis, achieved results of
the finalized KNN approach were compared with obtained results of LDA, DLDA,
QDA, DQDA and SVM classifiers. LDA had an accuracy of 63.81% (training) and
63.98% (testing), sensitivity of 56.76% (training) and 56.49% (testing), specificity of
81.76% (training) and 82.63% (testing), PPV of 88.8% (training) and 89.02% (test-
ing), NPV of 42.61% (training) and 43.26% (testing), DLDA had an accuracy of
62.1% (training) and 61.92% (testing), sensitivity of 55.44% (training) and 55.05%
(testing), specificity of 79.05% (training) and 79.04% (testing), PPV of 87.8% (train-
ing) and 86.74% (testing), NPV of 63.81% (training) and 63.98 (testing), QDA had an
accuracy of 56.19% (training) and 55.23% (testing), sensitivity of 40.91% (training)
and 39.9% (testing), specificity of 94.04% (training) and 93.41% (testing), PPV of
852 A. Singh and B. Pandey
94.44% (training) and 93.79% (testing), NPV of 39.12% (training) and 38.42% (test-
ing), DQDA had an accuracy of 54.48% (training) and 54.72% (testing), sensitivity of
38.2% (training) and 38.22% (testing), specificity of 95.95% (training) and 95.81%
(testing), PPV of 96% (training) and 95.78% (testing), NPV of 37.87% (training) and
38.37% (testing), and LSSVM had an accuracy of 84.19% (training) and 83.19%
(testing), sensitivity of 54.67% (training) and 52.69% (testing), specificity of 96%
(training) and 95.43% (testing), PPV of 84.54% (training) and 82.24% (testing), NPV
of 84.11% (training) and 83.4% (testing). Fig. 6 illustrates the comparative view of
obtained accuracy of classifiers and Table 1 shows the simulation results of all im-
plemented classifiers in terms of statistical parameters out which proposed KNN ap-
proach (with correlation distance metric and nearest rule) appears to take the lead and
is selected the best predictive model for liver disease diagnosis.
4 Conclusion
and eighty three samples from diverse patients. False negative rates were reduced by
dividing health examination data into training and testing. Experimentation confirmed
that the results of proposed KNN approach were superior to LDA, DLDA, QDA,
DQDA and LSSVM classifiers. Thousands people lose their lives because of errone-
ous evaluation and inappropriate treatment as medical cases are still largely influ-
enced by the subjectivity of clinicians. Therefore, development of computer aided
systems in medicine seems to be of great use in assisting physicians and in providing
training to novice researchers.
References
[1] A. Singh, B. Pandey, Intelligent techniques and applications in liver disorders:
a survey, Int. J. Biomed. Eng. Technol. 16 (2014) 27–70.
[2] C.-L. Chuang, Case-based reasoning support for liver disease diagnosis.,
Artif. Intell. Med. 53 (2011) 15–23.
[3] R.H. Lin, C.L. Chuang, A hybrid diagnosis model for determining the types
of the liver disease, Comput. Biol. Med. 40 (2010) 665–670.
[4] E.L. Yu, J.B. Schwimmer, J.E. Lavine, Non-alcoholic fatty liver disease:
epidemiology, pathophysiology, diagnosis and treatment, Paediatr. Child
Health (Oxford). 20 (2010) 26–29.
[5] G.S. Babu, S. Suresh, Meta-cognitive RBF Network and its Projection Based
Learning algorithm for classification problems, Appl. Soft Comput. 13 (2013)
654–666.
[6] M. Aldape-Perez, C. Yanez-Marquez, O. Camacho-Nieto, A. J Arguelles-
Cruz, An associative memory approach to medical decision support systems.,
Comput. Methods Programs Biomed. 106 (2012) 287–307.
[7] J.S. Sartakhti, M.H. Zangooei, K. Mozafari, Hepatitis disease diagnosis using
a novel hybrid method based on support vector machine and simulated
annealing (SVM-SA), Comput. Methods Programs Biomed. 108 (2015) 570–
579.
[8] S. Ansari, I. Shafi, A. Ansari, Diagnosis of liver disease induced by hepatitis
virus using Artificial Neural Networks, Multitopic Conf. (INMIC), 2011
IEEE …. (2011) 8–12.
[9] P. Revesz, T. Triplet, Classification integration and reclassification using
constraint databases, Artif. Intell. Med. 49 (2010) 79–91.
[10] A.M. Hashem, M.E.M. Rasmy, K.M. Wahba, O.G. Shaker, Prediction of the
degree of liver fibrosis using different pattern recognition techniques, in: 2010
5th Cairo Int. Biomed. Eng. Conf. CIBEC 2010, 2010: pp. 210–214.
[11] D. a. Elizondo, R. Birkenhead, M. Gamez, N. Garcia, E. Alfaro, Linear
separability and classification complexity, Expert Syst. Appl. 39 (2012)
7796–7807.
[12] İ.Ö. Bucak, S. Baki, Diagnosis of liver disease by using CMAC neural
network approach, Expert Syst. Appl. 37 (2010) 6157–6164.
[13] A.G. Floares, Intelligent clinical decision supports for interferon treatment in
chronic hepatitis C and B based on i-biopsy™, in: 2009 Int. Jt. Conf.
Neural Networks, 2009: pp. 855–860.
856 A. Singh and B. Pandey