An Euclidean Distance Based KNN Computational Method For Assessing Degree of Liver Damage

Diagnosis of Liver Disease Using Correlation Distance
Metric Based K-Nearest Neighbor Approach
Aman Singh1,* and Babita Pandey

1
Department of Computer Science and Engineering, Lovely Professional University, Jalandhar,
Punjab – 144411, India
amansingh.x@gmail.com
2
Department of Computer Applications, Lovely Professional University, Jalandhar, Punjab-
144411, India,
shukla.babita@yahoo.co.in
Abstract. Mining meaningful information from huge medical datasets is a key

aspect of automated disease diagnosis. In recent years, liver disease has
emerged as one of the commonly occurring disease worldwide. In this study, a
correlation distance metric and nearest rule based k-nearest neighbor approach
is presented as an effective prediction model for liver disease. Intelligent classi-
fication algorithms employed on liver patient dataset are linear discriminant
analysis (LDA), diagonal linear discriminant analysis (DLDA), quadratic dis-
criminant analysis (QDA), diagonal quadratic discriminant analysis (DQDA),
least squares support vector machine (LSSVM) and k-nearest neighbor (KNN)
based approaches. K-fold cross validation method is used to validate the per-
formance of mentioned classifiers. It is observed that KNN based approaches
are superior to all classifiers in terms of attained accuracy, sensitivity, specifici-
ty, positive predictive value (PPV) and negative predictive (NPV) value rates.
Furthermore, KNN with correlation distance metric and nearest rule based ma-
chine learning approach emerged as the best predictive model with highest di-
agnostic accuracy. Especially, the proposed model attained remarkable sensitiv-
ity by reducing the false negative rates.
Keywords: Liver disease diagnosis; k nearest neighbor; data mining; classifica-

tion
1 Introduction
The wide availability of computer aided systems for medicine is evident but an accu-
rate and well-organized diagnosis is still considered to be an art. To look inside a
patient for finding principal causes of a disease is impossible as human body func-
tions in an intricate way. Generally, diagnosis is being conducted on the basis of
symptoms present and by analyzing the history of patient lifestyle. Assessment of
patients needs proficient and experienced physicians in dealing their multifaceted
cases. Solving these cases is not easy as the dimensions of medical science have been
extremely expanding. Development of powerful diagnostic frameworks using classifi-
© Springer International Publishing AG 2016 845

J.M. Corchado Rodriguez et al. (eds.), Intelligent Systems Technologies
and Applications 2016, Advances in Intelligent Systems and Computing 530,
DOI 10.1007/978-3-319-47952-1_67
846 A. Singh and B. Pandey
cation methods is becoming a key factor of improvement in medical assessment. Si-

milarly, a range of expert systems have also been designed for liver disease diagnosis
[1].
A healthy liver leads to healthy life. Liver performs various metabolic functions
include filtration of blood, detoxification of chemicals, metabolizing drugs and bile
secretion [2]. It also assists in digestion, absorption and processing of food. Improper
working of any of the liver function leads to liver disease. Factors that can cause liver
damage are heavy long term alcohol consumption, accumulation of excess amount of
body fat, high salt intake, and overuse of medications. General symptoms of liver
disease are queasiness, appetite loss, sleepiness, and chronic weakness. While the
disease progresses, severe symptoms may include jaundice, hemorrhage, inflamed
abdomen and decline in mental abilities. The most common category of liver disease
are alcoholic liver disease and nonalcoholic fatty liver disease [1–4].
Literature study proves the wide usage of classification techniques and methods in
liver disease diagnosis. Artificial neural network (ANN), decision trees, fuzzy logic
(FL), rule-based reasoning (RBR), case-based reasoning (CBR), support vector ma-
chine (SVM), genetic algorithm (GA), artificial immune system (AIS) and particle
swarm optimization (PSO) are deployed individually or in integration. General liver
disorder was classified using ANN based approach [5,6], hepatitis disease was diag-
nosed using simulated annealing and SVM [7], and using feed-forward neural net-
work [8]. SVM was also implemented to classify primary biliary cirrhosis [9]. ANN
was also being used to predict liver fibrosis severity in subjects with HCV [10] and to
assess complexity level in hepatitis examination samples [11]. CMAC neural network
was deployed for classifying hepatitis B, hepatitis C and cirrhosis [12]. C5.0 decision
tree and AdaBoost were used to categorize chronic hepatitis B and C [13]. C4.5 deci-
sion tree was used to examine liver cirrhosis [14]. Fuzzy logics were used for hepatitis
diagnosis [15], general liver disorder diagnosis [16–18], and for classifying liver dis-
ease as alcoholic liver damage, primary hepatoma, liver cirrhosis and cholelithiasis
[19]. ANN-CBR integration was used to test hepatitis in patients [2]. AN-FL was used
to deal with class imbalance problem and to enhance classification accuracy of liver
patient data [20]. AIS-FL was used to evaluate prediction accuracy of liver disorder in
patients [21]. ANN-GA was used to classify liver fibrosis [22]. ANN-PSO and ANN-
CBR-RBR hybridization were proposed for hepatitis diagnosis [23,24].
Mortality rates are rapidly increasing in liver disease cases. This indicates the need
of computer-aided systems for assessing liver patients. The study accordingly pro-
posed a correlation distance metric and nearest rule based KNN prediction model for
learning, analysis and diagnosis of liver disease. The model is more liable and accu-
rate in comparison with other traditional classifiers. Attained results of presented
model are compared with other classifiers include LDA, DLDA, QDA, DQDA and
LSSVM using statistical parameters. These parameters are accuracy, sensitivity, spe-
cificity, PPV and NPV rates. It also shows the capability to act as specialist assistant
Diagnosis of Liver Disease Using Correlation Distance Metric … 847
in training clinicians and to perform in time prediction of disease by minimizing the

false positives.
The rest of the paper is organized as follows. Section 2 presents the proposed intel-
ligent system and the other classifiers implemented. Section 3 describes with achieved
simulation results. Finally, section 4 concludes the study.
2 Methodologies
Primarily physicians play a vital role in taking final decision on a patient assessment
and treatment. Nevertheless, applicability of classification algorithms boost the pre-
diction rate accuracy and also acts as a second opinion on substantiating the sickness.
Consequently, this section presents the description of classification algorithms dep-
loyed in the study for liver disease diagnosis. These classifiers include LDA, DLDA,
QDA, DQDA, LSSVM and correlation distance metric and nearest rule based KNN
approach which are introduced as follows.
LDA is widely used for categorization and dimensionality reduction of data. It per-
forms efficiently for unequal frequencies in within-class matrices. It works on the
concept of maximum separability by capitalize on the ratio of between-class variance
and within-class variance. It draws decision region between available classes in data
which actually helps in understanding the feature data distribution [25,26]. For exam-
ple, a given dataset have A classes; ߤܾ is mean vector of class b where b=1, 2, . . A;
ܾܺ is total samples within class b where b=1, 2, .. A.
‫ܣ‬
ܺ ൌ ෍ ܾܺ ሺͳሻ
ܾൌͲ
‫ܣ‬ ܺܿ
‫ ݌ܯ‬ൌ ෍ ෍ሺ‫ ܿݕ‬െ ߤܾ ሻሺ‫ ܿݕ‬െ ߤܾ ሻܶ ሺʹሻ

ܾൌͳ ܿൌͳ
‫ܣ‬
‫ ݍܯ‬ൌ ෍ ሺߤܾ െ ߤሻሺߤܾ െ ߤሻܶ ሺ͵ሻ

ܾൌͳ
‫ܣ‬
ߤ ൌ ͳΤ‫ ܣ‬෍ ߤܾ ሺͶሻ

ܾൌͳ
where X is total number of samples, Mp , Mq and μ is the within-class scatter metric,

between-class scatter metric and mean of entire dataset respectively. On contrary,
DLDA assumes each class with same diagonal covariance.
QDA classifier performs same calculations as LDA and is also considered as a

more generalized version of the later. The difference lies between both is the compu-
tation of discriminant functions [27]. Unlike LDA which draw an average from all
three classes, QDA uses unique covariance matrices. It takes mean and sigma as a
parameter for the available class. The covariance metric which is unique for each
class is represented by sigma. On contrary, DQDA is a modification to QDA in which
off diagonal elements are positioned to zero for each class covariance metric. It fits in
the family of naive bayes classifier that follows multivariate normality. The given
class should have more than two observations as the variance of a feature cannot be
approximated with less than two.
SVM is a linear learning machine in which the structural risk of misclassifying is

minimized choosing a hyperplane between two classes. On margins of an optimal
hyperplane the training data is called as support vectors which help in classifying the
features to their respective classes. These support vectors are being determined during
the learning process. If data is nonlinearly separable then kernel functions draw the
data in elevated dimensional attribute space. Specific selection of a hyperplane and
kernel function is required to design an efficient support vector machine. In this study,
least square hyperplane and radial basis kernel function were used to classify the pa-
tient data [28,29].
KNN is a semi-supervised and competitive learning method that belongs to the

family of instance based algorithms. It creates its model based on training dataset and
predicts a new data case by searching training data for the k-most similar cases. It
strongly retains all observations selected at the time of training. This prediction data
case of k-most similar cases is recapitulated and returned as the forecast for a new
case. The selection of distance metric functions for finding similarity measure de-
pends on structure of data. Available functions are euclidean, cityblock, cosine, corre-
lation and hamming out of which correlation performed best for this study and ham-
ming was not supportive as it can only be used for categorical or binary data [30, 31].
Let’s assume a pa-by-q data of metric A that can be represented as pa (1-by-q) row
vectors a1 , a2 , . . . . apa , and pb-by-q data of metric B that can be represented as pb
(1-by-q) row vectors b1, b2, . . . . bpb. The correlation distance is the statistical differ-
ence between vector au and bv are defined in Eq. (5).
ሺܽ‫ ݑ‬െ ܽത‫ ݑ‬ሻሺܾ‫ ݒ‬െ ܾത‫ ݒ‬ሻԢ

݀‫ ݒݑ‬ൌ ൭ͳ െ ൱ሺͷሻ
ඥሺܽ‫ ݑ‬െ ܽത‫ ݑ‬ሻሺܽ‫ ݑ‬െ ܽത‫ ݑ‬ሻԢ ඥሺܾ‫ ݒ‬െ ܾത‫ ݒ‬ሻሺܾ‫ ݒ‬െ ܾത‫ ݒ‬ሻԢ
ͳ
‫ܽ݁ݎ݄݁ݓ‬ത‫ ݑ‬ൌ ෍ ܽ‫ ݆ݑ‬ሺ͸ሻ
‫ݍ‬ ݆
ͳ
ܾത‫ ݒ‬ൌ ෍ ܽ‫ ݆ݒ‬ሺ͹ሻ
‫ݍ‬ ݆
3 Simulations and Results
The liver health examination data used in this study has the objective of improving
the ability of diagnosing liver disease based on features collected. This dataset is
available with university of california machine learning repository. Samples in this
collection are 583 and each sample consists of 11 features as entrance parameters out
which 10 are contributing as inputs and one is acting as a target for determining a
person as sick or healthy individual. These features include age (patient’s age), gender
(patient’s gender), TB (total bilirubin), DB (direct bilirubin), ALP (alkaline phospha-
tase), SGPT (alamine aminotransferase), SGOT (aspartate aminotransferase), TP (to-
tal proteins), ALB (albumin), A/G ratio (albumin and globulin ratio) and Selector
(field used to split data into two sets - sick or healthy. Data was divided into training
and testing part using 10-fold cross validation method. Diagnostic results of classifi-
cation algorithms were compared using statistical parameters which are defined in Eq.
(8), (9), (10), (11) and (12) respectively.
ܶܲ ൅ ܶܰ
‫ ݕܿܽݎݑܿܿܣ‬ൌ ሺͺሻ
ܶܲ ൅ ܶܰ ൅ ‫ ܲܨ‬൅ ‫ܰܨ‬
ܶܲ
ܵ݁݊‫ ݕݐ݅ݒ݅ݐ݅ݏ‬ൌ ሺͻሻ
ܶܲ ൅ ‫ܰܨ‬
ܶܰ
ܵ‫ ݕݐ݂݅ܿ݅݅ܿ݁݌‬ൌ ሺͳͲሻ
ܶܰ ൅ ‫ܲܨ‬
ܶܲ
ܸܲܲ ൌ ሺͳͳሻ
ܶܲ ൅ ‫ܲܨ‬
ܶܰ
ܸܰܲ ൌ ሺͳʹሻ
ܶܰ ൅ ‫ܰܨ‬
where TN designates true negative (normal people rightly identified as normal), TP is
true positive (diseased people rightly identified as diseased), FN is false negative
(diseased people wrongly identified as normal), and FP expresses false positive (nor-
mal people incorrectly identified as diseased).
Classification algorithms deployed were LDA, DLDA, QDA, DQDA and KNN
based approaches. These KNN approaches include KNN with euclidean, cityblock,
cosine, and correlation distance matrices. Each mentioned KNN variant was imple-
mented using nearest, random and consensus rules for classifying the cases. Fig. 1, 2,
3, 4, and 5 illustrates the comparison among KNN based approaches using statistical
parameters mentioned in eq. (8), (9), (10), (11) and (12). KNN with euclidean dis-
tance metric and nearest rule had 100% (training) and 96.05% (testing) accuracy,
KNN with euclidean distance metric and random rule had 100% (training) and
95.54% (testing) accuracy, KNN with euclidean distance metric and consensus rule
had 100% (training) and 95.37% (testing) accuracy, KNN with cityblock distance
metric and nearest rule had 100% (training) and 95.88% (testing) accuracy, KNN with
cityblock distance metric and random rule had 100% (training) and 96.23% (testing)
accuracy, KNN with cityblock distance metric and consensus rule had 100% (train-
ing) and 95.88% (testing) accuracy, KNN with cosine distance metric and nearest rule
had 100% (training) and 96.23% (testing) accuracy, KNN with cosine distance metric
and random rule had 100% (training) and 95.57% (testing) accuracy, KNN with co-
sine distance metric and consensus rule had 100% (training) and 95.88% (testing)
accuracy, KNN with correlation distance metric and nearest rule had 100% (training)
and 96.74% (testing) accuracy, KNN with correlation distance metric and random rule
had 100% (training) and 96.23% (testing) accuracy and KNN with correlation dis-
tance metric and consensus rule had 100% (training) and 96.05% (testing) accuracy.
KNN with euclidean distance metric and nearest rule had 100% (training) and
91.62% (testing) sensitivity, KNN with euclidean distance metric and random rule
had 100% (training) and 93.41% (testing) sensitivity, KNN with euclidean distance
metric and consensus rule had 100% (training) and 93.41% (testing) sensitivity, KNN
with cityblock distance metric and nearest rule had 100% (training) and 94.01% (test-
ing) sensitivity, KNN with cityblock distance metric and random rule had 100%
(training) and 95.81% (testing) sensitivity, KNN with cityblock distance metric and
consensus rule had 100% (training) and 94.01% (testing) sensitivity, KNN with co-
sine distance metric and nearest rule had 100% (training) and 92.81% (testing) sensi-
tivity, KNN with cosine distance metric and random rule had 100% (training) and
90.79% (testing) sensitivity, KNN with cosine distance metric and consensus rule had
100% (training) and 90.42% (testing) sensitivity, KNN with correlation distance me-
tric and nearest rule had 100% (training) and 95.81% (testing) sensitivity, KNN with
correlation distance metric and nearest rule had 100% (training) and 94.61% (testing)
sensitivity and KNN with correlation distance metric and nearest rule had 100%
(training) and 92.22% (testing) sensitivity.
97.84% (testing) specificity, KNN with euclidean distance metric and random rule
had 100% (training) and 96.39% (testing) specificity, KNN with euclidean distance
metric and consensus rule had 100% (training) and 96.15% (testing) specificity, KNN
with cityblock distance metric and nearest rule had 100% (training) and 96.63% (test-
ing) specificity, KNN with cityblock distance metric and random rule had 100%
(training) and 96.39% (testing) specificity, KNN with cityblock distance metric and
consensus rule had 100% (training) and 96.63% (testing) specificity, KNN with co-
sine distance metric and nearest rule had 100% (training) and 97.6% (testing) speci-
ficity, KNN with cosine distance metric and random rule had 100% (training) and
97.51% (testing) specificity, KNN with cosine distance metric and consensus rule had
100% (training) and 98.08% (testing) specificity, KNN with correlation distance me-
tric and nearest rule had 100% (training) and 97.12% (testing) specificity, KNN with
correlation distance metric and random rule had 100% (training) and 96.88% (testing)
specificity and KNN with correlation distance metric and consensus rule had 100%
(training) and 97.6% (testing) specificity.
94.44% (testing) PPV, KNN with euclidean distance metric and random rule had
100% (training) and 91.23% (testing) PPV, KNN with euclidean distance metric and
consensus rule had 100% (training) and 90.07% (testing) PPV, KNN with cityblock
distance metric and nearest rule had 100% (training) and 91.81% (testing) PPV, KNN
with cityblock distance metric and random rule had 100% (training) and 91.43% (test-
ing) PPV, KNN with cityblock distance metric and consensus rule had 100% (train-
ing) and 91.81% (testing) PPV, KNN with cosine distance metric and nearest rule had
100% (training) and 93.94% (testing) PPV, KNN with cosine distance metric and
random rule had 100% (training) and 93.66% (testing) PPV, KNN with cosine dis-
tance metric and consensus rule had 100% (training) and 94.97% (testing) PPV, KNN
with correlation distance metric and nearest rule had 100% (training) and 93.02%
(testing) PPV, KNN with correlation distance metric and random rule had 100%
(training) and 92.4% (testing) PPV, and KNN with correlation distance metric and
consensus rule had 100% (training) and 93.9% (testing) PPV.
96.67% (testing) NPV, KNN with euclidean distance metric and random rule had
100% (training) and 97.33% (testing) NPV, KNN with euclidean distance metric and
consensus rule had 100% (training) and 97.32% (testing) NPV, KNN with cityblock
distance metric and nearest rule had 100% (training) and 97.57% (testing) NPV, KNN
with cityblock distance metric and random rule had 100% (training) and 98.28% (test-
ing) NPV, KNN with cityblock distance metric and consensus rule had 100% (train-
ing) and 97.57% (testing) NPV, KNN with cosine distance metric and nearest rule had
100% (training) and 97.13% (testing) NPV, KNN with cosine distance metric and
random rule had 100% (training) and 96.31% (testing) NPV, KNN with cosine dis-
tance metric and consensus rule had 100% (training) and 96.23% (testing) NPV, KNN
with correlation distance metric and nearest rule had 100% (training) and 98.3% (test-
ing) NPV, KNN with correlation distance metric and random rule had 100% (training)
and 97.82% (testing) NPV, and KNN with correlation distance metric and consensus
rule had 100% (training) and 96.9% (testing) NPV. Finally, KNN with correlation
distance metric and nearest rule was found superior to other KNN based approaches
in diagnosing liver disease.
To select the best predictive model for liver disease diagnosis, achieved results of
the finalized KNN approach were compared with obtained results of LDA, DLDA,
QDA, DQDA and SVM classifiers. LDA had an accuracy of 63.81% (training) and
63.98% (testing), sensitivity of 56.76% (training) and 56.49% (testing), specificity of
81.76% (training) and 82.63% (testing), PPV of 88.8% (training) and 89.02% (test-
ing), NPV of 42.61% (training) and 43.26% (testing), DLDA had an accuracy of
62.1% (training) and 61.92% (testing), sensitivity of 55.44% (training) and 55.05%
(testing), specificity of 79.05% (training) and 79.04% (testing), PPV of 87.8% (train-
ing) and 86.74% (testing), NPV of 63.81% (training) and 63.98 (testing), QDA had an
accuracy of 56.19% (training) and 55.23% (testing), sensitivity of 40.91% (training)
and 39.9% (testing), specificity of 94.04% (training) and 93.41% (testing), PPV of
Fig. 1. The obtained accuracy
Fig. 2. The obtained sensitivity
Fig. 3. The obtained specificity

Fig. 4. The obtained PPV
Fig. 5. The obtained NPV
Fig. 6. The comparative view of achieved accuracy of classifiers

94.44% (training) and 93.79% (testing), NPV of 39.12% (training) and 38.42% (test-
ing), DQDA had an accuracy of 54.48% (training) and 54.72% (testing), sensitivity of
38.2% (training) and 38.22% (testing), specificity of 95.95% (training) and 95.81%
(testing), PPV of 96% (training) and 95.78% (testing), NPV of 37.87% (training) and
38.37% (testing), and LSSVM had an accuracy of 84.19% (training) and 83.19%
(testing), sensitivity of 54.67% (training) and 52.69% (testing), specificity of 96%
(training) and 95.43% (testing), PPV of 84.54% (training) and 82.24% (testing), NPV
of 84.11% (training) and 83.4% (testing). Fig. 6 illustrates the comparative view of
obtained accuracy of classifiers and Table 1 shows the simulation results of all im-
plemented classifiers in terms of statistical parameters out which proposed KNN ap-
proach (with correlation distance metric and nearest rule) appears to take the lead and
is selected the best predictive model for liver disease diagnosis.
Table 1. The prediction results of classification algorithms

Classification method LDA DLDA QDA DQDA LSSVM Proposed
KNN
Training 63.81 62.1 56.19 54.48 84.19 100
Accuracy (%)
Testing 63.98 61.92 55.23 54.72 83.19 96.74
(%)
Training 56.76 55.44 40.91 38.2 54.67 100
Sensitivity (%)
Testing 56.49 55.05 39.9 38.22 52.69 95.81
(%)
Training 81.76 79.05 94.04 95.95 96 100
Specificity (%)
Testing 82.63 79.04 93.41 95.81 95.43 97.12
(%)
Training 88.8 87.08 94.44 96 84.54 100
PPV (%)
Testing 89.02 86.74 93.79 95.78 82.24 93.02
(%)
Training 42.61 56.37 39.12 37.87 84.11 100
NPV (%)
Testing 43.26 56.87 38.42 38.37 83.4 98.3
(%)
4 Conclusion
Applicability of computer-aided systems has improved medical practices to an

immense extent. Proposed approach proved to be a controlling learning technique that
worked well on the given recognition problem related to liver disease. It showed the
capability of improving complex medical decisions based on training data by finding
k closest neighbors to a new instance. The presented intelligent system significantly
diagnoses liver disease by means of euclidean distance metric and nearest rule based
KNN approach. Disease prediction was carried out using a vast data of five hundred
and eighty three samples from diverse patients. False negative rates were reduced by
dividing health examination data into training and testing. Experimentation confirmed
that the results of proposed KNN approach were superior to LDA, DLDA, QDA,
DQDA and LSSVM classifiers. Thousands people lose their lives because of errone-
ous evaluation and inappropriate treatment as medical cases are still largely influ-
enced by the subjectivity of clinicians. Therefore, development of computer aided
systems in medicine seems to be of great use in assisting physicians and in providing
training to novice researchers.
References
[1] A. Singh, B. Pandey, Intelligent techniques and applications in liver disorders:
a survey, Int. J. Biomed. Eng. Technol. 16 (2014) 27–70.
[2] C.-L. Chuang, Case-based reasoning support for liver disease diagnosis.,
Artif. Intell. Med. 53 (2011) 15–23.
[3] R.H. Lin, C.L. Chuang, A hybrid diagnosis model for determining the types
of the liver disease, Comput. Biol. Med. 40 (2010) 665–670.
[4] E.L. Yu, J.B. Schwimmer, J.E. Lavine, Non-alcoholic fatty liver disease:
epidemiology, pathophysiology, diagnosis and treatment, Paediatr. Child
Health (Oxford). 20 (2010) 26–29.
[5] G.S. Babu, S. Suresh, Meta-cognitive RBF Network and its Projection Based
Learning algorithm for classification problems, Appl. Soft Comput. 13 (2013)
654–666.
[6] M. Aldape-Perez, C. Yanez-Marquez, O. Camacho-Nieto, A. J Arguelles-
Cruz, An associative memory approach to medical decision support systems.,
Comput. Methods Programs Biomed. 106 (2012) 287–307.
[7] J.S. Sartakhti, M.H. Zangooei, K. Mozafari, Hepatitis disease diagnosis using
a novel hybrid method based on support vector machine and simulated
annealing (SVM-SA), Comput. Methods Programs Biomed. 108 (2015) 570–
579.
[8] S. Ansari, I. Shafi, A. Ansari, Diagnosis of liver disease induced by hepatitis
virus using Artificial Neural Networks, Multitopic Conf. (INMIC), 2011
IEEE …. (2011) 8–12.
[9] P. Revesz, T. Triplet, Classification integration and reclassification using
constraint databases, Artif. Intell. Med. 49 (2010) 79–91.
[10] A.M. Hashem, M.E.M. Rasmy, K.M. Wahba, O.G. Shaker, Prediction of the
degree of liver fibrosis using different pattern recognition techniques, in: 2010
5th Cairo Int. Biomed. Eng. Conf. CIBEC 2010, 2010: pp. 210–214.
[11] D. a. Elizondo, R. Birkenhead, M. Gamez, N. Garcia, E. Alfaro, Linear
separability and classification complexity, Expert Syst. Appl. 39 (2012)
7796–7807.
[12] İ.Ö. Bucak, S. Baki, Diagnosis of liver disease by using CMAC neural
network approach, Expert Syst. Appl. 37 (2010) 6157–6164.
[13] A.G. Floares, Intelligent clinical decision supports for interferon treatment in
chronic hepatitis C and B based on i-biopsy&#x2122, in: 2009 Int. Jt. Conf.
Neural Networks, 2009: pp. 855–860.
[14] W. Yan, M. Lizhuang, L. Xiaowei, L. Ping, Correlation between Child-Pugh

Degree and the Four Examinations of Traditional Chinese Medicine (TCM)
with Liver Cirrhosis, 2008 Int. Conf. Biomed. Eng. Informatics. (2008) 858–
862.
[15] O.U. Obot, S.S. Udoh, A framework for fuzzy diagnosis of hepatitis, 2011
World Congr. Inf. Commun. Technol. (2011) 439–443.
[16] P. Luukka, Fuzzy beans in classification, Expert Syst. Appl. 38 (2011) 4798–
4801.
[17] I. Gadaras, L. Mikhailov, An interpretable fuzzy rule-based classification
methodology for medical diagnosis, Artif. Intell. Med. 47 (2009) 25–41.
[18] M. Neshat, M. Yaghobi, M.B. Naghibi, A. Esmaelzadeh, Fuzzy expert system
design for diagnosis of liver disorders, in: Proc. - 2008 Int. Symp. Knowl.
Acquis. Model. KAM 2008, 2008: pp. 252–256.
[19] L.K. Ming, L.C. Kiong, L.W. Soong, Autonomous and deterministic
supervised fuzzy clustering with data imputation capabilities, Appl. Soft
Comput. 11 (2011) 1117–1125.
[20] D.-C. Li, C.-W. Liu, S.C. Hu, A learning method for the class imbalance
problem with medical data sets., Comput. Biol. Med. 40 (2010) 509–518.
[21] E. Mȩżyk, O. Unold, Mining fuzzy rules using an Artificial Immune System
with fuzzy partition learning, Appl. Soft Comput. 11 (2011) 1965–1974.
[22] F. Gorunescu, S. Belciug, M. Gorunescu, R. Badea, Intelligent decision-
making for liver fibrosis stadialization based on tandem feature selection and
evolutionary-driven neural network, Expert Syst. Appl. 39 (2012) 12824–
12832.
[23] O.U. Obot, F.M.E. Uzoka, A framework for application of neuro-case-rule
base hybridization in medical diagnosis, Appl. Soft Comput. J. 9 (2009) 245–
253.
[24] S.N. Qasem, S.M. Shamsuddin, Radial basis function network based on time
variant multi-objective particle swarm optimization for medical diseases
diagnosis, Appl. Soft Comput. 11 (2011) 1427–1438.
[25] J. Ye, Q. Li, A two-stage linear discriminant analysis via QR-decomposition.,
IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 929–41.
[26] Y. Guo, T. Hastie, R. Tibshirani, Regularized linear discriminant analysis and
its application in microarrays., Biostatistics. 8 (2007) 86–100.
[27] S. Srivastava, M.R. Gupta, B.A. Frigyik, Bayesian Quadratic Discriminant
Analysis, J. Mach. Learn. Res. 8 (2007) 1277–1305.
[28] C. Cortes, V. Vapnik, Support-Vector Networks, Mach. Learn. 20 (1995)
273–297. doi:10.1023/A:1022627411411.
[29] D. Tsujinishi, S. Abe, Fuzzy least squares support vector machines for
multiclass problems, in: Neural Networks, 2003: pp. 785–792.
[30] S. Sun, R. Huang, An adaptive k-nearest neighbor algorithm, in: Proc. - 2010
7th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2010, 2010: pp. 91–94.
[31] H. Samet, K-nearest neighbor finding using MaxNearestDist, IEEE Trans.
Pattern Anal. Mach. Intell. 30 (2008) 243–252.

An Euclidean Distance Based KNN Computational Method For Assessing Degree of Liver Damage

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

An Euclidean Distance Based KNN Computational Method For Assessing Degree of Liver Damage

Încărcat de

Drepturi de autor:

Formate disponibile

Diagnosis of Liver Disease Using Correlation Distance

Metric Based K-Nearest Neighbor Approach

Aman Singh1,* and Babita Pandey

Abstract. Mining meaningful information from huge medical datasets is a key

Keywords: Liver disease diagnosis; k nearest neighbor; data mining; classifica-

© Springer International Publishing AG 2016 845

cation methods is becoming a key factor of improvement in medical assessment. Si-

in training clinicians and to perform in time prediction of disease by minimizing the

‫ ݌ܯ‬ൌ ෍ ෍ሺ‫ ܿݕ‬െ ߤܾ ሻሺ‫ ܿݕ‬െ ߤܾ ሻܶ ሺʹሻ

‫ ݍܯ‬ൌ ෍ ሺߤܾ െ ߤሻሺߤܾ െ ߤሻܶ ሺ͵ሻ

ߤ ൌ ͳΤ‫ ܣ‬෍ ߤܾ ሺͶሻ

where X is total number of samples, Mp , Mq and μ is the within-class scatter metric,

QDA classifier performs same calculations as LDA and is also considered as a

SVM is a linear learning machine in which the structural risk of misclassifying is

KNN is a semi-supervised and competitive learning method that belongs to the

ሺܽ‫ ݑ‬െ ܽത‫ ݑ‬ሻሺܾ‫ ݒ‬െ ܾത‫ ݒ‬ሻԢ

3 Simulations and Results

Fig. 1. The obtained accuracy

Fig. 2. The obtained sensitivity

Fig. 3. The obtained specificity

Fig. 4. The obtained PPV

Fig. 5. The obtained NPV

Fig. 6. The comparative view of achieved accuracy of classifiers

Table 1. The prediction results of classification algorithms

Applicability of computer-aided systems has improved medical practices to an

[14] W. Yan, M. Lizhuang, L. Xiaowei, L. Ping, Correlation between Child-Pugh

S-ar putea să vă placă și