Sunteți pe pagina 1din 10

Improved Fuzzy K-Nearest Neighbor Using

Modified Particle Swarm Optimization


Formatted: Left
Jamaluddin1, Rimbun Siringoringo2
1,2
Universitas Methodist Indonesia
Jl. Hang Tuah No 8 Medan-Indonesia
1
jac.satuno@gmail.com
2
rimbun.ringo@gmail.com

Abstract- Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of
fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main
drawback of FKNN is that it is difficult to determine the parameters. These parameters are the number of
neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the
values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper
‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of
‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of
PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German
Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which
is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is
expected to increase the value of classification performance. Based on the experiments that have been done indicating
that the model offered in this research results in a better classification performance compared to the Fk -NN model
only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the
accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;
such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has
accuracy 75%, and the decision tree model has 70%

Keywords : fuzzy k-nearest neighbor, modified particle swarm optimization, german credit data

I. INTRODUCTION Formatted: Indent: Left: 0.25", Hanging: 0.14"

Non-performing loans is one of the most important issues on industrial and financial services [1]. At a
certain level, accumulation of loan defaults can trigger bankruptcy of a bank and other financial institutions.
Learn about the background and characteristic of financial customers a factor that significant before deciding to
give credit. To answer these problems, one concern of funding or banks is how to build an assessment technique
to ensure eligibility of a customer on the filing of credit [2]
Machine Learning is a field of information technology that has been widely applied to build decision support
system, especially in the field of economics and finance. Machine learning plays a very important role and has
resulted in various research studies that have been applied to related to credit risk assessment. The following is a
variety of machine learning-based research that has succeeded in building the instruments and models used in
the effective credit risk predictors and estimators. The researches are: K-Nearest Neighbor Method [3], [4],
Commented [H1]: For me the first and the second paragraph is
Fuzzy K-Nearest Neighbor [5], Bayesian Network [6], Support Vector Machine [7], Artificial Neural Network
not related. Possible to connect both paragraphs? You can mention
[8], Fuzzy Immune Learning [9] and Logistic Regression [10]. non performing loans as one of the the issues, and there are some
model to access the credit risk.
The k-Nearest Neighbor (k-NN) method is the most popular machine-learning method, simple and easy to
implement. In addition to these advantagesHowever, k-NN has two weaknesses. Firstly, the success of this Commented [H2]: Supporting literature is needed.
method depends on the number of neighbors applied, so in order to produce a high degree of accuracy, Commented [H3]: Supporting literature is needed
researchers should try different values of k with varying amounts, of course this is not effective because it is
done manually . This may be reflected in the research conducted [3], by applying the various ks, the best Commented [H4]: ‘This’ refers to? The sentence is unclear
accuracy is obtained at k = 3, while [5] obtains the best accuracy at k = 13. Second, in addition to the reliance on Commented [H5]: What is ‘ks’? You should give more
the k value, the relation between instances with classes is rigid (crisp), where each instance/event/incident has background if you use an abbreviation.
only a relationship with one class exclusively, whereas within the other classes, it has no relationship at all. Commented [H6]: I am not sure what do you mean by
‘instances’. It is the right term on the model? Maybe ‘event’ could
Many attempts have been made to avoid the k-NN stiffness properties. One attempt is to combine fuzzy
be the right term?
theory into k-NN. The application of the fuzzy theory produced a new method known as Fuzzy k-Nearest
Neighbor (Fk-NN) [11]. In the FK-NN model the relationship between data and classes is not crisp, each data Commented [H7]: Be consistent, Fk-NN atau FK-NN?
and class has membership relationship with a certain level. The strength of the relationship requires a fuzzy Commented [H8]: Is it the right term?
strengh (m) parameter. Compared to k-NN, Fk-NN always results in a higher degree of accuracy in all
Commented [H9]: I don’t know technically, but for me the
sentence is gramatically wrong.
classification problems [12]. It is also the main reason why the Fk-NN method becomes the preferred method of
interest in many researches, especially related to the economic issues.
The fuzzy strength parameters (m) and the neighboring number parameters (k) are fundamental determinants
of the Fk-NN model. This has the meaningIt means that the values of m and k have a direct impact on the model
accuracy. Determining the values of ‘m’ and ‘k’ is often not easy and difficult to control because there is no Commented [H10]: Of coud be in Italic
theory or guide that concludes how appropriate ‘m’ and ‘k’ should be [12]. To answer the problem, it is
Formatted: Font: Italic
necessary to have another method that can help the Fk-NN model to find the value of m and k. In this study, the
authorswe offer an approach to assess the parameter optimization parameter solution (parameter optimization) in Formatted: Font: Italic
order to determine the best value of m and k. Formatted: Font: Italic

In this study, researchers we applied Modified Particle Swarm Optimization (MPSO). The application of the Formatted: Font: Italic
method is based on some considerations. The fFirst considerations, compared with similar algorithms, such as Formatted: Font: Italic
Genetic Algorithm (GA), PSO is relatively simple because it does not have many procedures such as selection,
Formatted: Font: Italic
mutation and crossover procedures in GA. The second considerationone, the PSO method has been proven to
optimize the parameters of other machine learning methods. This is shown through the following studies. PSO is Formatted: Font: Italic
well-suited in combination with SVM [13], PSO and Artificial Neural Network [14], PSO with Self Organizing Formatted: Font: Italic
Map or SOM [15]. The results obtained in these models indicate improved accuracy through the application of
Commented [H11]: What is this?
the PSO in optimizing parameters.
In this study, MPSO (Modified Particle Swarm Optimization) which is another variant of PSO wasis used to
optimize Fk-NN parameters. This research builtds a model to evaluate the granting of credit based on FK-NN
and MPSO classification., Iin other words, optimization of Fk-NN parameters by MPSO is expected to increase
classification accuracy.
II. LITERATURE REVIEW

A. Fuzzy k- Nearest Neighbor

In fuzzy theory, a data or object can have a membership value on each class. This has meaningmeans that an
object can be owned by a different class with a membership degree. These degrees isare valued between [0,1]. Commented [H12]: Singular
The fuzzy set theory generalizes the theory of k-Nearest Neighbor classics by defining the membership value of
Commented [H13]: In this case I use singular
a data [16]. The membership value of the a data in a class is greatly influenced by the distance of that data to its
nearest neighbor., Tthe closer a data to its neighbor the greater the membership value of that data to its Commented [H14]: Plural
neighboring class, and vice versa. The Fuzzy k-Nearest Neighbor algorithm can be explained as follows. Commented [H15]: Must be the same. Either singular –
singular or plural - plural
Fuzzy k-Nearest Neighbor algorithm can be described as follows :
Commented [H16]: Data is plural, so without an article “a”
If singular, datum, you can use an article ‘a’
1. Normalize the data using the largest and the least value of data in each feature. The nNormalization of
the data is determined by the following equation (1).

𝑥 − 𝑚𝑖𝑛𝑎
𝑥′ = (1)
𝑚𝑎𝑥𝑎 − 𝑚𝑖𝑛𝑎

With 𝑥 ′ is the value of the results of normalization , x is the value of the original what????? , 𝑚𝑖𝑛𝑎 is
the value of minimim on features ‘a’ and 𝑚𝑎𝑥𝑎 is the maximum value on features ‘a’

2. Find k- nearest neighbor for the data testing x use the following equation ( 2 )
𝑝
(2)
𝑑𝑖 = √∑(𝑥′𝑖 − 𝑥𝑖 )2
𝑖=1

3. Count the value membership u(x, yi) using the equation ( 3 ) , where 1  i  C.

µ𝒊 (𝒙)
𝟏
∑𝒌𝒋=𝟏 µ𝒊𝒋 ( 𝟐/𝒎−𝟏 )
|𝒙 − 𝒙𝒋 | (3)
=
𝟏
∑𝒌𝒋=𝟏 ( 𝟐/𝒎−𝟏 )
|𝒙 − 𝒙𝒋 |
Where µi(x) is the value of membership data x to ci;, µ𝑖𝑗 is the value of membership data neighbors ;, m
is weight rank ( weight exponent ) the size of the m > 1 and k is the number of nearest neighbor used .

4. Take the greatest value use the equation ( 4 ) where 1  i  C, C is the number of classes .

v = arg 𝑚𝑎𝑥 (µ𝒊 (𝒙)) (4)


5. Give the class label ‘v’ to the test data test ‘x’ ‘yi’
B. Modified Particle Swarm Opitimization (MPSO)

PSO is a population-based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in
1995. PSOs are inspired by the social behavior of birds or fish [18]. PSOs can be used to search for optimal Commented [H17]: PSOs or PSO? Be consistent
solutions within a large search space [19]. Inside PSO, each solution can be considered as a patent. Each particle
Commented [H18]: As I don’t technically know about the
represents a potential solution which is a point in the search space [18]. All particles have fitness values model, I don’t understand this sentence.
evaluated by an evaluation function [19]. The fitness and speed values are used to set the flight direction
according to the best experience of the herd to look for global optimum (gbest) in the search space. The renewal
of the particle is solved by the equation? (4) and the equation ? (5). The qEquation (4) is used to calculate the
new velocity of each particle and the equation (5) is used to update the position of each particle in solution space
[18].
𝑖 𝑔
𝑣𝑘+1 = 𝑣𝑘𝑖 + 𝑐1 ∗ 𝑟𝑎𝑛𝑑 ∗ (𝑝𝑘𝑖 − 𝑥𝑘𝑖 ) + 𝑐2 ∗ 𝑟𝑎𝑛𝑑 ∗ (𝑝𝑘 − 𝑥𝑘𝑖 ) (4)
𝑖
𝑥𝑘+1 = 𝑥𝑘𝑖 + 𝑣𝑘+1
𝑖 (5)

The classical PSO equation has changed in order to improve the PSO's capability. The first group of PSO
modifications consisted of modifications to the parameter of inertia weight or linear decreasing inertia weight
and the second using the parameter of the constriction factor or constriction factor [17] Commented [H19]: Is it redundant or a typo?

1) Linear Decreasing Inertia Weight (LDIW): In the classical PSO method, the weight value of inertia or
inertia weight is constant, so in some cases the classical PSO method becomes less efficient. Shi and Eberhart
(1998) modified the value of inertia weight by consideration at the beginning of the iteration., Tthe weight of
inertia is set to a large enough value to expand the search area and avoid being trapped in local optimum, then at
the last iteration, the inertial weight is set small enough to obtain an aAccurate end result. The inertial weight of
the Linear Decresing Inertia Weight (LDIW) method can be determined using the equation (5) [18]. The wmax
parameter is usually applied at 0.9 and the wmin parameter is applied at 0.4. Parameter C1 = C2 = 1-2. [20].
Furthermore, [18] describing the determination of particle velocity on the MSO can be determined using
equation (7) [18].

(𝑤𝑚𝑎𝑥 − 𝑤𝑚𝑖𝑛 ) (6)


𝑤𝑘𝑖 = ∗ 𝑖𝑡𝑒𝑟
𝑖𝑡𝑒𝑟𝑚𝑎𝑥
𝑖 𝑔
𝑣𝑘+1 = 𝑤𝑘 ∗ 𝑣𝑘 + 𝑐1 ∗ 𝑟𝑎𝑛𝑑 ∗ (𝑝𝑖 − 𝑥𝑘𝑖 ) + 𝑐2 ∗ 𝑟𝑎𝑛𝑑 ∗ (𝑝𝑘 – 𝑥𝑘𝑖 )
𝑖 𝑖 (7)

Where 𝑤𝑘𝑖 is inertia weight in the epoch of i;, wmax is the maximum weight inertia;, wmin is the minimum weight
inertia;, itermax is the maximum epoch and iter is the current epoch

2) Constriction Factor Method (CFM): [21] The method implements constriction factor known as
Contriction Factor Method (CFM) [21. These upgrades and modifications aim to ensure a search in the PSO
algorithm to converge more quickly. Constriction factor is determined by using the equation (8). The
determination of the particle speed of the MPSO can be determined using the equation (6) and the speed update
is denoted by the equation (9) [19]. Parameters  = C1 + C2 and  > 4, so to qualify the values of C1 and C2
are usually worth <2, 05.

2 (8)
𝐶=
|2 − 𝜑 − √𝜑 2 − 4𝜑|
𝑖 𝑔
𝑣𝑘+1 = 𝐶 ∗ 𝑣𝑘𝑖 + 𝑐1 ∗ 𝑟𝑎𝑛𝑑 ∗ (𝑝𝑖 − 𝑥𝑘𝑖 ) + 𝑐2 ∗ 𝑟𝑎𝑛𝑑 ∗ (𝑝𝑘 − 𝑥𝑘𝑖 ) (9)

C. CMatrix confusion Matrix


Matrix confusion is one of the popular methods used to evaluate classification performance. The confusion
matrix is manifested in a tabular form that states the amount of the test data test that is correctly classified and
the amount of the test data test being misclassified [22]. Table 1 is a confusion matrix table for binary
classification.
TABLE I

Matrix confusion

Prediction Class
yes no Total
Actual yes TP FN P
Class no FP TN N
Total P’ N’ P+N Commented [H20]: All must be in English
Where True Positive (TP) is the nNumber of documents from class 1 correctly classified as class 1;, True
Negative (TN) is the nNumber of documents from class 0 correctly classified as class 0;, False Positive (FP) is
Number of documents from class 0 incorrect Classified as class 1 and False Negative is the number of
documents from class 1 that are misclassified as class 0
III. RESEARCH MODEL

A. The FKNN-MPSO Classification Model


This research was conducted in several stages, including pre-process data stage, selection of parameters k, m
optimal using MPSO and FKNN. The FKNN-MPSO classification model is shown in Figure 1. The above
troubleshooting procedures can be explained in the following steps:

Step 1 : Initialize the values of ‘k’ and ‘m’. In the FK-NN model, the values of ‘k’ and ‘m’ must
be firstly initialized. firstly
Step 2 : Generate the initial particles randomly. The values of ‘k’ and ‘m’ are generated in parallel
as many as 8 particles.
Step 3 : The k and m values above are used to training the data. Commented [H21]: What do you mean by training?
Step 4 : Each particle is evaluated based on the fitness value. The fitness function is used to
calculate fitness or the level of goodness of an individual to survive. This function takes
individual parameters and produces the fitness value of the individual. For each problem to
be solved with the PSO algorithm must be defined by a fitness function. In this study, the
fitness function is presented in the equation (8)

(𝑇𝑃 + 𝑇𝑁) (8)


𝑓𝑖𝑡𝑛𝑒𝑠𝑠 =
(𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁)

Step 5 : Update the position and the velocity of the particles. The PSO procedure requires that the
position and the velocity of each particle be evaluated by equations (2.11) and (2.12).
Step 6 : At this stage, again testing FK-NN model to find the best particle. The best particles
evaluated by the fitness value in step 4 were applied to train the trainee data and to calculate Commented [H22]: Is this the right term that is used in the Fk-
its fitness value NN model?
Step 7 : Update personal optimal fitness (pfit) and personal optimal position (pbest). So far all the
particles in the first (first) iteration have been evaluated, the first iteration yields the best
particle (pbest).
Step 8 : At this stage it is evaluated whether all iterations have been completed. If it is not
completed yet, resumed in the step 3, if it is completed then proceed toin step 10.
Step 9 : Each iteration has the best particles. At this stage the best particles of all iterations will be
evaluated to determine the best global particles optimal position (gbest).
Step 10 : If the gbest value in step 10 is the expected value then at that moment has obtained the
optimum k and m value and continued toin step 12. If the gbest value has not matched the Formatted: Font: Italic
expected criterion yet, then the new resurrected population is back to theby step 4. Formatted: Font: Italic
Step 12 : Perform data classification testing by entering the test data test/the tested data..
Step 13 : Cross validation to see if the classification result in the step 12 has the best accuracy. If yes
then the whole process is completed, if not then do the evaluation value K,. uUntil the Commented [H23]: Capital ‘K’?
smallest accuracy value is obtained
Commented [H24]: This should be “k value’ instead of value k
Group 1

Start K=K+1 Commented [H25]: Must be translated into English.


Encoding (k,m)
Uji data training dengan
nilai k dan m optimal

Bangkitkan Partikel
awal secara random

Klasifikasi data uji

Uji Model FK-NN

No
Kriteria cross
Iterasi baru validasi dicapai?

Yes
hitung nilai fitness

Hitung tingkat
accuracy
update posisi partikel

update kecepatan Finish


partikel

Uji Model FK-NN

Update personal optimal fitness (pfit)


dan personal optimal position (pbest)

No Jumlah populasi
tercapai ?

Yes

Update global optimal fitness (gfit)


dan global optimal position (gbest)

No
Kriteria berhenti
dicapai?

Yes

Figure 1. System Flowchart

IV. EXPERIMENTAL DESIGN

A. Dataset

The dataset used in this study is German Credit Data, which is sourced from UCI Machine Learning Repository.
This dataset was chosen for several considerations. Firstly, it does not contain missing values [22]; secondly,
this dataset also belongs to a high-dimensional dataset with 21 attributes (A1 through A21) and 1000 data. The
dataset consists of 1000 instances, 21 attributes including class attributes. Datasets are grouped into 2 classes, ie Commented [H26]: Please check and make sure this is the right
700 instances for the good classes and 300 instancestranes for the bad onesclasses. This dataset does not contain term.
missing value or an empty value [23]. Table 2 shows the description of the dataset and in table (2) the Commented [H27]: You have mentioned this!
description of its attributes is shown. The dData sets are available in the .arff format (relation file format
attribute). Commented [H28]: What is this?

TABLE II

german credit data description

Dataset German Credit Data


Atributte type Category and Numerical
Number of atributte 20
Number of instance 1000
Missing value No
Number of class 2 Commented [H29]: Must be translated into English

B. Modified Particle Swarm Optimization

1) Parameter Setting : The MPSO (Modified Particle Swarm Optimization) algorithm is a parametric
method, where its application requires parameter setting. In table III below is given a list of parameters of Commented [H30]: I can’t find the table 3
MPSO applied.
2) Particle Encoding : A particle encoding scheme is performed by generating random numbers ranging
from 0 to 10 with a length of 20 numbers. Of the 20 numbers, the numbers 1 through 15 are used for the
generation of parameters m, numbers 16 to 20 are applied for the generating parameters k. Figure 2 shows a
particle coding scheme.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

P1 Formatted: Space After: 0 pt


m k
Formatted: Space After: 0 pt
Figure 2. Particle coding scheme

V. RESEARCH RESULTS

MPSO algorithm begins with swarm inizsilization process. In the table IV, the swarm initialization scheme is
displayed. Segmentation of the tableThe table segmentation consists of 3 parts: particle composition, parameters
(k, m) and fitness value. Swarm consists of 10 particles (P1-P10), each particle contains of 20 random numbers, Commented [H31]: Also please check this is the right English
k and m values and the fitness. For example,. pParticle-1 Consistsing of 20 random numbers (7 1 2 1 3 5 3 9 0 4 term
4 6 6 9 2 4 0 9 7 0). The value of k, m which is raised on the particle is 13, 2.31,. andWith the value of k, m Commented [H32]: Based on the table below, there are P1-
obtained the fitness value 0.76 or 76%. P10. Why did you just mentioned P1?

TABLE IIIV Formatted: Font: Italic


Inisialisasi swarm Formatted: Font: Italic
Partkel k,m fitness
P1 : 7 1 2 1 3 5 3 9 0 4 4 6 6 9 2 4 0 9 7 0 13, 2.31 0.76
P2 : 2 3 2 2 5 0 2 0 7 4 9 0 5 6 5 5 7 9 4 0 11, 1.53 0.75
P3 : 6 2 7 2 4 7 2 0 6 9 8 3 7 1 9 6 0 1 2 2 15, 3.09 0.76
P4 : 9 4 5 9 9 3 2 1 7 9 1 4 3 5 6 4 9 8 5 1 11, 3.87 0.76
P5 : 5 6 8 7 3 7 9 2 4 0 2 0 7 3 2 4 9 8 8 0 13, 2.31 0.76
P6 : 7 2 5 2 4 1 3 0 4 5 5 2 2 4 9 9 8 8 4 0 9, 2.31 0.76
P7 : 3 1 2 8 1 4 7 7 7 5 3 5 0 9 8 6 1 6 4 9 17, 3.09 0.76
P8 : 4 4 1 8 1 9 6 4 2 5 9 1 5 9 5 2 9 4 5 2 11, 3.87 0.76
P9 : 4 6 0 5 3 9 9 6 6 2 4 7 3 2 2 2 3 6 4 4 13, 2.31 0.76
P10 : 6 5 7 4 3 8 2 5 6 2 8 5 8 4 0 8 9 5 7 5 15, 3.87 0.76

3) Update the particles: In the next swarm, each particle will be updated until it reaches the target fitness
value of 1.00 (100%). If this value can not be achieved then the highest value will be taken as a
solution. In the tTable table V displays the particle updates or updates of the particles to swarm-100,
and resulted in the improvement of k, m and fitness values . Renewal of k and, m value and fitness Formatted: Font: Italic
value improvement. Based on the Table V below, In the table, the best fitness value is 0.81 (81%) and Formatted: Font: Italic
the value of k, m (31, 13.23)
Commented [H33]: Unclear

TABLE IV
Updated partikel swarm particle-100 (final)

Partkel k,m fitness


P1 : 91 85 86 85 87 89 87 93 84 88 88 90 90 93 86 88 84 93 91 84 31, 13.23 0.81
P2 : 89 90 89 89 92 87 89 87 94 91 96 87 92 93 92 92 94 96 91 87 31, 13.23 0.81
P3 : 96 92 97 92 94 97 92 90 96 99 98 93 97 91 99 96 90 91 92 92 31, 13.23 0.81
P4 : 88 83 84 88 88 82 81 80 86 88 80 83 82 84 85 83 88 87 84 80 31, 13.23 0.81
P5 : 84 85 87 86 82 86 88 81 83 79 81 79 86 82 81 83 88 87 87 79 31, 13.23 0.81
P6 : 85 80 83 80 82 79 81 78 82 83 83 80 80 82 87 87 86 86 82 78 31, 13.23 0.81
P7 : 87 85 86 92 85 88 91 91 91 89 87 89 84 93 92 90 85 90 88 93 31, 13.23 0.81
P8 : 87 87 84 91 84 92 89 87 85 88 92 84 88 92 88 85 92 87 88 85 31, 13.23 0.81
P9 : 90 92 86 91 89 95 95 92 92 88 90 93 89 88 88 88 89 92 90 90 31, 13.23 0.81
P10 : 94 93 95 92 91 96 90 93 94 90 96 93 96 92 88 96 97 93 95 93 31, 13.23 0.81
4) Global Best (GBest): Global Best or GBest are is the particles with the greatest fitness value of all swarm
available. Tablehe following table VI shows GBest. The table contains information about swarm, particle
and fitness values (Global Best fitness). The GBest position is achieved in the 100th swarm (S100), with k Formatted: Font: Italic
and m values (k, m) being are (31, 13.23) and the fitness isof 0.81 (81%). Formatted: Font: Italic
TABLE VI
Global Best
S GBest k,m fitness. Formatted: Space After: 0 pt
100 91 85 86 85 87 89 87 93 84 88 88 90 90 93 86 88 84 93 91 84 31, 13.23 0,81
Formatted: Space After: 0 pt

4) Influence of the Particle Number: To examine whether there is an effect of applying the number of
particles to the increase of fitness value (in this case is the accuracy), this study applieds the number of
particle varies i.e 10 particles, 20 particles and 40 particles. The test results are shown in the table VII, Commented [H34]: This in unclear
below. Based on Of the three tests, it can be indicated that the increase in the number of particles has
no significant effect on fitness improvement.
TABLE VII
Hasil pengujian Test Result Commented [H35]: English
10 Partikel 20 Partikel 40 Partikel Commented [H36]: english
No. No. No.
k,m fitness k,m fitness k,m fitness
Swarm Swarm Swarm
1 15, 3.09 0,76 1 9, 3.87 0,77 1 13, 3.87 0,78
2 15, 3.09 0,76 2 17, 4.65 0,77 2 13, 3.87 0,78
3 15, 5.43 0,79 3 17, 6.21 0,77 3 19, 3.87 0,78
4 31, 12.45 0,8 4 23, 9.33 0,77 4 25, 6.21 0,8
5 15, 7.77 0,8 5 25, 9.33 0,79 5 29, 8.55 0,8
6 17, 7.77 0,8 6 25, 10.11 0,8 6 29, 10.89 0,8
7 23, 8.55 0,8 7 31, 10.11 0,8 7 31, 12.45 0,8
8 27, 9.33 0,8 8 31, 10.11 0,8 8 31, 12.45 0,8
9 31, 11.67 0,8 9 31, 10.11 0,8 9 31, 12.45 0,8
10 31, 11.67 0,8 10 31, 10.1 0,8 10 31, 12.45 0,8
11 31, 12.45 0,8 11 31, 10.11 0,8 11 31, 13.23 0,81
12 31, 13.23 0,8 12 31, 10.11 0,8 12 31, 13.23 0,81
13 31, 13.23 0,81 13 31, 13.23 0,81 13 31, 13.23 0,81
14 31, 13.23 0,81 14 31, 13.23 0,81 14 31, 13.23 0,81
100 31, 13.23 0,81 100 31, 13.23 0,81 100 31, 13.23 0,81

6) Global Seeking: Global Seeking is the GBest tracing process starting from the first swarm to the last
swarm. In this penance GBest is the fitness with the greatest value, so the search process will form the Commented [H37]: Is it the right term?
ascending graph as shown in figure 3.

0.82 0.82 0.815 Formatted Table


0.81
0.81 0.81
0.805
0.8 0.8 0.8
Accuracy
Accuracy

Accuracy

0.79 0.79 0.795


0.79
0.78 0.78
0.785
0.77 0.77 0.78
0.76 0.775
0.76
0.77
0.75 0.75 0.765
1
14
27
40
53
66
79
92
1

1
21

71

53
11

31
41
51
61

81
91

14
27
40

66
79
92

Swarm Swarm Swarm

a) b) c)
Formatted: Space After: 0 pt
Gambar 3. Grafik Global Seeking : a) 10 partikel, b) 20 partikel, c) 40 partikel

Figure 3. Global Seeking Charts: a) 10 particles, b) 20 particles, c) 40 particles


C. Testing Performance model classification Fk-NN + MPSO

Model performance is determined based on confustion matrix. Based on the confustion matrix, is determined
three categories of performance models are determined. There are that accuracy, precision and AUC. In Table Commented [H38]: The sentence is unclear
VIII and Table IX the following shows Fk-NN confusion matrix FkNN and confusion matrix FkNN-MPSO
confusion matrix.
TABLE VIII
Fk-NN Confusion matrix Fk-NN

Prediksi Positif Prediksi Negatif Total Commented [H39]: English


Aktual
46 5 51
Positif
Aktual
24 25 49
Negatif
Total 70 30 100

TABLE VIIIIX
Confusion matrix model Fk-NN-MPSO

Prediksi Positif Prediksi Negatif Total


Aktual Positif 54 15 69
Aktual Negatif 4 27 31
Total 58 42 100 Commented [H40]: English

Based on the table confusion matrix tables above, in table VIII and table IX above can be determined
the following prediction performance can be determined. The results showed that the FkK-NN + MPSO
model resulted in better classification performance thanof the Fk-NN model. The cComparison of both
performance of both model is can be presented in the table X and Figurepicture 4

TABLE IX
Perbandingan Performa modelComparison of Fk-NN model dengan performanced with
model Fk-NN + MPSO model Commented [H41]: English
Model
Performa
Fk-NN + MPSO
(0 sd 1) Fk-NN
(Penelitian ini) Commented [H42]: English
Accuracy 0,71 0,81
Precision 0,78 0,93
AUC 0,77 0,84

1
0.5
0
Accuracy Precision AUC

Fk-NN Fk-NN + MPSO

Figure 4. The Comparison Graph of Fk-NN Performance Comparison between Fk-NN Model and with Model
Fk-NN + MPSO Model

D. The Comparison withof 2 Model Superiority Models

To validate the superiority of the FK-NN + MPSO model in predicting creditworthiness, the performance of
results obtained by the FK-NN + MPSO method are is compared with two other classifiersmodels; the Naive
Bayes method and the Decision Tree method . Theest performance results using the Naive Bayes method, the
Decision Tree method is presented in table XI. Figure 5 shows the comparison in a graphical form.In graphical
form, the comparison is shown in figure 5.

TABLE XI
Perbandingan superioritas modelComparison of model superiority Commented [H43]: English
Metode
Performa
Fk-NN +MPSO
( 0 sd. 1) Fk-NN Naive Bayes Decision Tree
(Penelitian ini)
Accuracy 0,70 0,75 0,70 0,81
Precision 0,78 0,74 0,68 0,93
AUC 0,77 0,78 0,64 0,84
Formatted: Space After: 0 pt
1.00
0.80
0.60
0.40
0.20
0.00
Fk-NN Naive Bayes Decision Fk-NN
Tree +MPSO

Accuracy Precision AUC

Figure 5. Graph of model superiority comparison

VI. CONCLUSION

This research offers a model of Fuzzy k-Nearest Neighbor (Fk-NN) classification based on the Modified Particle
Swarm Optimization (MPSO) algorithm. The application of MPSO to Fk-NN aims to improve classification
performance through the optimization of adjacent (k) and fuzzy (m) parameters. Application of MPSO
application can eliminate the aspect of sujektifitas in determining parameters k and m. The result of the research Commented [H44]: English
shows that the application of MPSO application in the classification process has improved the accuracy of Fk-
NN, thereby affecting the performance improvement of creditworthiness classification. The FkK-NN and MPSO
models can work on high-dimensional data such as germancredit datasets. This study also compared the
superiority of the FK-NN + MPSO model with other classification models namely Naive Bayes and Decision
Tree. The Test results provide results that theresult shows that FK-NN + MPSO model provides superior
performance compared to the two models. Commented [H45]: Explain the limitation of the research and
the suggestions for future research would be recommended.

VI. BIBLIOGRAPHY

[1] Lee,. M-C. Enterprise Credit Risk Evaluation models: A Review of Current Research Trend. International Journal of Computer
Applications, 44(11) : 0975 – 8887. 2012
[2] Ghatasheh, A. Business Analytics using Random Forest Trees for Credit Risk Prediction: A Comparison Study. International Journal
of Advanced Science and Technology, 72 (2014) : 19-30. 2014.
[3] Abdelmoula, A. K. Bank credit risk analysis with k-nearestneighbor classifier: Case of Tunisian banks. Accounting and Management
Information Systems, 14 (1) : 79-106. 2015.
[4] Rahman, M.M., Ahmed, S. & Shuvo, M.H. Nearest Neighbor Classifier Method for Making Loan Decision in Commercial Bank. I.J.
Intelligent Systems and Applications, 4 (8) : 60-68. 2014
[5] Kurama, O., Luukka, P. & Collan, K. Credit Analysis Using a Combination of Fuzzy Robust PCA and a Classification Algorithm.
Advances in Intelligent Systems and Computing-Springer, 3(15) : 19-29. 2015.
[6] Mortezapour, R. & Afzali, M. Assessment of Customer Credit through Combined Clustering of Artificial Neural Networks, Genetics
Algorithm and Bayesian Probabilities. International Journal of Computer Science and Information Security, 11( 12) : 1-5. 2013.
[7] Chen, Q., Xue, H.F. & Yan, L. Credit risk assessment based on potential support vector machine. International Conference on
Natural Computation (ICNC) : pp. 1-25. 2011.
[8] Karimi, A. Credit Risk Modeling for Commercial Banks. International Journal of Academic Research in Accounting, Finance and
Management Sciences, 4 (3) : 1-6. 2014.
[9] Kamalloo, E. & Abadeh, M.S. Credit Risk Prediction Using Fuzzy Immune Learning. Advances in Fuzzy Systems-Hindawi, 3 (2014) : 1-
12 . 2014
[10] Takyar, S.M.T., Nashtaei, R.A. & Chirani, E. The comparison of credit Risk between Artificial Neural Network and Logistic regression
Models in Tose-Taavon Bank in Guilan. International Journal of Applied Operational Research, 5(1) : 63-72. 2014.
[11] Rosyid, H., Prasetyo, E. & Agustin, S. Perbaikan Akurasi Fuzzy K-Nearest Neighbor In Every Class Menggunakan Fungsi Kernel.
Prosiding Seminar Nasional Teknologi Informasi dan Multimedia 2013 : pp. 13-18. 2013.
[12] Derrac, J., Chiclana, F., Garcia, F. & Hererra, F. Evolutionary Fuzzy K-Nearest Neighb ors Algorithm usingInterval-Valued Fuzzy
Sets. Centre for computation intelligent : 1-28. 2014.
[13] Danenas, P. & Garsva, G. Credit risk evaluation modeling using evolutionary linear SVM classifiers and sliding window approach.
Proceeding of International Conference on Computational Science-ICCS : pp. 1324 – 1333. 2012.
[14] Li, S., Zhu, Y., Xu, C. & Zhou, Z. Study of Personal Credit Evaluation Method Based on PSO-RBF Neural Network Mode. American
Journal of Industrial and Business Management, 3(2013) : 429-434. 2013.
[15] O’Neill, M. & Brabazon, A. Self-Organizing Swarm (SOSwarm) for Financial Credit-Risk Assessment. Proceeding on 2008 IEEE
Congress on Evolutionary Computation : pp. 3087 – 3093. 2008.
[16] Keller, M. James, Michael R Gray, James A. Givens. A Fuzzy K-Nearest Neighbor. IEEE Transactions On Sistem and Cybernetics, 15(4) :
1-8. 1985.
[17] Jacubcoca, M., Maca, P. & Pech, P. A comparison of selected modifications of the particle swarm optimization algorithm. Journal of
Applied Mathematics., 14(2014) : 10-15. 2014.
[18] Guo, H. & He, J. A modified particle swarm optimization algorithm. Journal of Computer Science 10(2) : 341-346. 2013.
[19] Yang, C.H., Hsiao, C-H. & Chuang, L-Y. Linearly decreasing weight particle swarm optimization with accelerated strategy for data
clustering.International Journal of Computer Science, 37(3) : 3-9. 2010
[20] Clerc, M. The swarm and the queen: towards a deterministic and adaptive particle swarmoptimization. Proceeding of Congress on
Evolutionary Computation : pp. 1951-1957. 1999.
[21] Han, J., Kamber, M. & Pei, J. Data mining tecniques and concepts. Morgan Kaufman publisher. Watham : USA. 2012
[22] Ramya, R., S. Analysis of feature selection techniques in credit risk assessmen. Proceedings of International Conference on
Advanced Computing and Communication Systems (ICACCS -2015) : pp. 1-6. 2015

S-ar putea să vă placă și