Multi-Objective Rule Discovery Using The Improved Niched Pareto Genetic Algorithm

2011 Third International Conference on Measuring Technology and Mechatronics Automation
Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic

Algorithm
Junli Lu*1, Fan Yang1, Momo Li1, Lizhen Wang2
1
Department of Mathematics and Computer Science, Yunnan University of nationalities, Kunming 650031, China
Department of Computer Science and Engineering, School of Information,Yunnan University,Kunming 650091,China
Ljl1982_3_6@126.com
AbstractWe present an efficient genetic algorithm for mining

multi-objective rules from large databases. Multi-objectives
will conflict with each other, which makes it optimization
problem that is very difficult to solve simultaneously. We
propose a multi-objective evolutionary algorithm called
improved niched Pareto genetic algorithm(INPGA), which not
only accurate selects the candidates but also saves selection
time with combining BNPGA and SDNPGA. Because the effect
of selection operator relies on the samples, we proposed
clustering-based sampling method, and we also consider the
situation of zero niche count. We have compared the execution
time and rules generation by INPGA with that by BNPGA and
SDNPGA. The experimental results confirm that our method
has edge over BNPGA and SDNPGA.
this difficulty, the literature [6] has proposed the multiobjective genetic algorithm called SDNPGA which is an
improved version of BNPGA for rule generation.
This paper is organized as follows. In section an
overview of simple genetic algorithm for classification rule
generation is provided. Section introduces the BNPGA
and SDNPGA and in section, we have discussed the
improved niched Pareto genetic algorithm(INPGA). The
implementation of our simulation experiments is discussed
in Section. Finally, Section concludes this paper.
AN OVERVIEW OF SIMPLE GENETIC ALGORITHM
In this section, we review the function of SGA. Genetic
algorithms are probabilistic search algorithms characterized
by the fact that a number N of potential solutions. The
population is modified according to the natural evolutionary
process: after initialization, selection and recombination are
executed in a loop until some termination criterion is
reached. Each run of the loop is called a generation.
The selection operator is intended to improve the
average quality of the population by giving individuals of
higher quality a higher probability to be copied into the next
generation. The quality of an individual is measured by a
fitness function. Recombination changes the genetic
material in the population either by crossover or by
mutation in order to obtain new points in the search space.
Keywords- Multi-objective rule; Niched Pareto genetic

algorithm; Data mining; Clustering; Zero niche count
INTRODUCTION
Classification rule mining is one of the most important
tasks in fuzzy logic system, 1 genetic algorithms (GAs) have
inspired many research efforts for optimization as well as
rule mining [1,3,15]. Traditional rule mining methods, are
usually accurate, but have brittle operations. Genetic
algorithms on the other hand provide a robust and efficient
approach to explore large search space. One of the GAs
called simple genetic algorithm (SGA) introduced by
J.H.Holland (1975) [1] and further extension can found in
[4,5] is good for rule generation satisfying a single objective.
However, practical rule generation is naturally posed as
multi-objective problems with two criteria: confidence
factor and comprehensibility [2,6]. A lot of multi-objective
GAs (MOGAs) [7,8] have been proposed. The simple GA
normally handles problems with such criteria by converting
them into a single objective problem. However, this
approach is unsatisfactory due to the nature of the
optimality conditions for multiple objectives. In the
presence of multiple and conflicting objectives, the resulting
optimization problem gives rise to a set of optimal solutions,
instead of just one optimal solution. Multiple optimal
solutions exist because no single solution can be a substitute
for multiple conflicting objectives. In order to overcome
A.. Genetic representations

Each individual in the population represents a candidate
rule R of the form if A then C. The antecedent of this
rule can be formed by a conjunction of n attributes. Each
condition or consequent is the fuzzy assignment for each of
the attribute. For instance, if attribute x1 has a fuzzy set
containing three partitions of {high, typical, low}
represented by numerical set {1, 2, 3}, then an assignment
of 2 in that cell represents If x1 is typical. Also, it is
possible to not use one of the feature elements in a rule.
This situation is handled by assigning a value of 0.
B. Rules form
We propose to use INPGA to discover high-level
prediction rules of the form:
* Supported by the Science Foundation of Yunnan Education

Committee under Grant No.08Y0264;the Youth Natural Science
Foundation of Yunnan University of nationalities under Grant
No.09QN10
978-0-7695-4296-6/11 $26.00 2011 IEEE
DOI 10.1109/ICMTMA.2011.449
IF some conditions hold on the values of a set of

predicting attributes
THEN predict a value for the goal attribute.
657
In other words, the value of a special attribute called the

goal attribute is predicted by the values given for other
attributes called the predicting attributes.
Because we wanted more domination pressure, a

sampling scheme is implemented. Two candidates for
selection are picked at random from the population. A
comparison set of individuals is also picked randomly from
the population. Each of the candidates is then compared
against each individual in the comparison set. If one
candidate is dominated by the comparison set, and the other
is not, the later is selected for reproduction. If neither or
both are dominated by the comparison set, then we must use
sharing to choose a winner, as we explained later.
We have referred to the domination, the domination
between two solutions is defined as follows (see [10,11]):
Definition 1. A solution x(1) is said to dominate the
other solution x(2), if the both following conditions are true:
The solution x(1) is not worse than x(2) with respect
to all the corresponding objectives.
The solution x(1) is strictly better than x(2) in at least
one objective.
2) Sharing on the non-dominated frontier
Fitness sharing was introduced by Goldberg and
Richardson[12], and calls for the degradation of an
individuals objective fitness f i by niche count mi
C. Fitness function
As discussed in SectioQ, the discovered rules should
have high confidence factor and high comprehensibility. In
this subsection, we discuss how these multiple criteria can
be incorporated into a single objective fitness function[6].
1) Comprehensibility metric
There are various ways to quantitatively measure rule
comprehensibility. The standard way of measuring
comprehensibility is to count the number of rules and the
number of conditions in these rules.
If a rule can have at most Ac conditions, the
comprehensibility Comp of a rule R can be defined as:
(1)
Comp ( R ) 1 ( Nc ( R ) / AC )
where N C (R ) is the number of conditions in the rule R.
2) Confidence factor
The antecedent part of the rule is a conjunction of
conditions. A very simple way to measure the confidence
factor of a rule Con (R ) is
(2)
Con( R ) SUP ( A C ) SUP ( A)
where |A| is the number of examples satisfying all the
conditions in the antecedent A and A C is the number of
calculated for that individual. This degradation is obtained

by simply dividing the objective fitness by the niche count
to find the shared fitness: f i / mi . The niche count mi is an
estimate of how crowded is the neighborhood (niche) of
individual i. It is calculated over all individuals in the
current population: mi
Sh[d [i, j ]] , where d[i,j] is
examples that satisfy both A and the consequent C.

The fitness function is computed as the arithmetic
weighted mean of comprehensibility and confidence factor.
Finally, the fitness function is given by:
w1 u Comp ( R ) w2 u Con ( R )
(3)
f ( x)
w1 w2
Where w1 and w2 are user-defined weights.
jPop
the distance between individuals i and j and Sh[d] is the

sharing function. Typically, the triangular sharing function
is used, where Sh[d ] 1 d / V share . Here, V share is the niche
radius. Individuals within V share distance of each other
NICHED PARETO GENETIC ALGORITHM
degrade each others fitness, since they are in the same

niche, but convergence of the full population is avoided.
When the candidate solutions are either both dominated
or both non-dominated, it is likely that they are in the same
equivalence class. The best fit candidate is determined to
be that candidate who has the least number of individuals in
its niche and thus the smallest niche count. We call this type
of sharing equivalence class sharing.
A.. The basic niched Pareto GA(BNPGA)

BNPGA, SDNPGA and our INPGA also studied on the
selection operator of genetic algorithm, the most widely
used selection technique is tournament selection. However,
tournament selection assumes that we want a single answer.
After several generations the population will converge to a
uniform one. To avoid convergence and maintain multiple
Pareto optimal solutions, the tournament selection is altered
in two ways. First, Pareto domination tournament is
introduced. Second, when a non-dominant tournament,
sharing is implemented to determine the winner[9].
1) Pareto domination tournaments
The binary relation of domination leads naturally to a
binary tournament in which two randomly selected
individuals are compared. But soon found that it produced
insufficient domination pressure. There were too many
dominated individuals in later generations. It seemed that a
sample size of two was too small to estimate an individuals
true domination ranking.
Figure 1. Equivalence class sharing
Figure 1 illustrates how this form of sharing should

work between two non-dominated individuals. Here, we are
maximizing along the x-axis and minimizing on the y-axis.
In this case, the two candidates are in the Pareto optimal
658
B. The improved niched Pareto genetic algorithm

For the problem 1, we can combine the two methods.
When the disparity of two niche counts is very large, the
equivalence class sharing is adopted, otherwise, the
standard deviation is adopted, which not only accurate
selects the candidates but also saves selection time.
For the problem 2, we suppose that the samples can
actually reflect the data distribution of original population.
Due to considering the diversity, we deal with it as follows:
For the two candidates candidate_1 and candidate_2,
candidate_1V share1 and candidate_ 2 V share2 :
subset (the dashed region) of the union of the comparison

set and the candidates. From a Pareto point of view, neither
candidate is preferred. But if we want to maintain useful
diversity, it is apparent to choose the candidate that has the
smaller niche count. In this case, it is candidate 2.
B. The standard deviation niched Pareto GA(SDNPGA)
When the candidates are either both dominated or both
non-dominated, the literature [6,13] also considers the
measure that can maintain useful diversity in the Pareto set.
The following approach can be suitable to achieve both the
goals, it is called standard deviation niched Pareto genetic
algorithm (SDNPGA).
Figure 2. The most standard deviation
If there is no sample in niche with radius
candidate_1 is selected. Otherwise, it is the same.

If there is no sample in the two niches, candidate_1 and
candidate_2 are both selected.
So we selected candidate 2 in figure 3. Note that the
premise that samples (comparison set) can actually reflect
the data distribution of original population, which is a
challenge to the sampling method in BNPGA and
SDPNGA-randomly sampling method. In this paper, we
propose a new sampling method-clustering-based sampling
method and we selected k-means clustering method[14].
The clustering-based method executes k-means
clustering to population at first, after obtaining several
clusters, sampling according to the size of clusters, the
isolated point will not be sampled. When selection operator
is executed, if the niche count is zero, which indicates the
candidate is in the area of isolated point, and not in any
cluster. Due to considering the diversity, we select the
candidate. It is the reason why we deal with the zero niche
count like that.
Figure 3 The situation of zero

niche count
Find out the center of gravity of both niche radius

( P1 and P 2 ) as:
P1
x
V
V share1 And P1
x
V
V share 2
i share 2
Calculate the standard deviation of both niche radius.
V1
(x j P2 )2
( xi P1 ) 2 And V 2
i
share1
xi share1
V share1 ,
x j share 2
The candidate having larger SD is chosen.

Figure 2 illustrates how to maintain diversity in SNPGA
method, we chose the candidate 2.
Sampling algorithm:
1 Executing k-means clustering on population, obtain K clusters
2 Computing the number of samples in each cluster Sum(j).
j=1K
3 Sampling individuals according to the size of each cluster, the
number
of
samples
of
each
cluster
is
p(i).
p(i)=round(sum(i)/Nc*Tdom); i=1K. Where Nc is the population
size, Tdom is the total number of samples.
OUR PROPOSED NICHED PARETO GENETIC

ALGORITHM(INPGA)
Our proposed method origins from the problems in the
subsection as follows:
The selection algorithm is as follows:
A.. Proposing Problem
Selection algorithm:
When the candidates are either both dominated or

both non-dominated, in BNPGA the equivalence class
sharing is adopted. Computing the number of samples in
niches, and considering of the diversity, it would choose the
candidate has the smaller niche count. But when the two
niche counts are much or less the same, the efficiency of the
method is suspicious. SDNPGA adopts the standard
deviation, but when the disparity of two niche counts is very
large, it is unnecessary to compute the standard deviation,
which wastes the time. What method will solve the problem?
Both BNPGA and SDNPGA are efficient, but they
did not consider the situation that there is no sample in one
niche or both niches, we call it zero niche count. We
illustrate it in figure 3.
1 The clustering-based sampling obtains the comparison set S.

2 For any two candidates and comparison set Sif one candidate is
dominated and the other not, the candidate non-dominated is selected,
the algorithm is over.
3 If two candidates (candidate_1 and candidate_2) are either
dominated or non-dominated, computing the number of samples in
two niches, count1 and count2.
4 If count1=0, candidate_1 is selected and if count2=0, candidate_2 is
selected, the algorithm is over.
5 If count1-count2>delta or count2-count1>delta, then selects
candidate_2 or candidate_1, the algorithm is over.
6 If abs(count1-count2)<delta, computing the standard deviation of
two niches,sd1 and sd2.
7 If sd1>sd2, candidate_1 is selected, otherwise, candidate_2 is
selected. The algorithm is over.
659
Delta is the threshold that evaluates the disparity of the

two niche counts, and decides the equivalence class sharing
or standard deviation is adopted.
Class
#
NR
EXPERIMENTAL RESULTS
We mainly compared the execution time and rules by

BNPGA, SDNPGA and our INPGA. The experiments were
performed using the nursery dataset obtained from the UCI
machine repository. The dataset is categorical; we also
executed the experiments in seriate dataset (iris dataset).
A.. Description of the dataset

The nursery dataset has 12,960 records and nine
attributes, all of them categorical. The ninth attribute is
treated as class attribute. The iris dataset has 150 instances
and 5 attributes, the last one is the class attribute, and all of
them are seriate except the last one.
Pc
Pm
Tdom
V share
delta
Nursery
iris
500
200
0.75
0.8
0.002
0.01
50
15
20
7
7
4
Class
#
NR
R
SP
radius. Delta is only used in INPGA.

Tables 2-4 show the result generated by BNPGA,
SDNPGA and INPGA, respectively, from nursery dataset.
Table 2 Rules generated by BNPGA from nursery dataset
NR
VR
SP
Mined rules
If(parents=usual)^(housing=less_c
onv)^(social=problematic)Then(cla
ss=P)
If(parents=great_pret)^(social=slig
htly_prob)^(helth=recommended)
Then (class=P)
onv)^(social=slightly_prob)^(helth
=not_recom)then(class=NR)
If(parents=pretentious)^(children=
3)^(housing=convenient)^(health=
not_recom) then (class=NR)
If (parents=gret_pret)^(children=2)
^(housing=critical)^(health=not_re
com) then (class=NR)
If(housing=less_conv)^(finance=in
conv)^(social=slightly_prob)^(helt
h=recommended) then (class=VR)
If(has_nurs=proper)^(finance=conv
enient)^(health=recommended)
then (class=R)
If(housing=convenient)^(finance=c
onvenient)^(children=2)^(social=sl
ightly_prob) then (class=SP)
Confiden
ce factor
0.7780
Comprehe
nsion
0.5
0.7867
0.625
0.79
0.751
0.75
Mined rules
onv)^(helth=not_recom)then(class
=NR)
If (parents=pretentious)
^(housing=convenient)^(health=not
_recom) then (class=NR)
If (parents=gret_pret)
enient)then (class=R)
If(housing=convenient)^(finance=c
onvenient)^(social=slightly_prob)
then (class=SP)
Confidenc
e factor
0.674
Comprehe
nsion
0.5
0.7867
0.625
0.79
0.751
0.75
0.812
0.625
The dataset above is categorical, we also executed the

experiments in seriate dataset-iris dataset. In order to obtain
the categorical data, we pre-process the dataset at first. Each
attribute is fuzzified, the number of fuzzy partitions for each
attribute is pre-determined. We fuzzy each attribute for 4
categories except the last one: More_sma, Small, Big,
More_big. The last one is treated as class attribute. So the
seriate data is changed to categorical. The rules discovered
by BNPGA, SDNPGA and INPGA from dataset are showed
in table 5-7.
P, population size; Pc, probability of crossover; Pm,

probability of mutation; Tdom, tournament size; V share , niche
Class
#
P
onv)^(social=slightly_prob)^(helth
=not_recom)then(class=NR)
If (parents=pretentious)
^(housing=convenient)^(health=not
_recom) then (class=NR)
If (parents=gret_pret)
enient)then (class=R)
Confidenc
e factor
0.634
Table 4 Rules generated by INPGA from nursery dataset
B. Rules discovered by BNPGA, SDNPGA and INPGA

The experiments have been performed using MATLAB
7.0 on Windows XP server. The data-specific parameters
and the parameters, which are encountered during the rule
discovery, are listed in Table 1.
Table 1 Parameters used for our the three methods
Dataset
Mined rules
Comprehen
sion
0.625
Table 5 Rules generated by BNPGA from iris dataset

0.8114
0.634
Class#
Mined rules
If(sl=more_sma)^(sw=big)^(pl=
more_sma)
then
(class=Irissetosa)
If(sl=more_small)^(sw=more_big)
^(pl=more_big) then (class=Irisversicolor)
If(sl=More_big)^(sw=small)
^(pl=more_small)then (class= Irisvirginica)
0.5
2
0.7641
3
0.783
0.897
0.5
0.71
0.625
0.79
0.5
Class#
1
Confidence
factor
0.7877
Comprehe
nsion
0.25
1.0
0.25
1.0
0.25
Table 6 Rules generated by SDNPGA from iris dataset

Mined rules
Confidence Comprehe
factor
nsion
0.8
0.25
more_sma)
then
(class=Irissetosa)
We omitted class P, VR and SP in table 3, omitted class

P,VR in table 4 and omitted class 2,3 in table 6, omitted
class 2 in table 7, because they are the same as that of table
2 and table 5 separately. From Tables 2-7 it can be observed
Table 3 Rules generated by SDNPGA from nursery dataset
660
as BNPGA, which is consistent with our discussion in

section and last subsection.
that the rules discovered by BNPGA have the lowest

confidence factor and comprehension, and comparative
performance of INPGA has edge over SDNPGA and
BNPGA.
CONCLUSION
In this article we have introduced SGA, BNPGA and

SDNPGA for classification rule generation, we also
discussed the INPGA. Theory analysis and experiments
evaluate the efficiency of our INPGA. We are now
concentrating on careful selection of attributes in a
preprocessing step[16,17], in order to reduce the number of
attributes. Though there are few applications of INGA in
data mining tasks[18], for validating its robustness and
scalability more practical application to various domains of
data mining and more studies are needed.
Table 7 Rules generated by INPGA from iris dataset

Class#
Mined rules
more_sma)
then
(class=Irissetosa)
If(sl=More_big)^(pl=more_small)
then (class= Iris-virginica)
Confidence
factor
0.82
Comprehe
nsion
0.25
1.0
0.5
C. Execution time
In this section, we mainly evaluate the time efficiency of
three methods. BNPGA should be the most fast because
randomly sampling is adopted and does not compute
standard deviation of niches(but from the previous section
we can see the rules discovered by BNPGA are not very
well compared with that of the other two methods). The
time efficiency of INPGA and SDNPGA are worth
analyzing.
1) Theory analysis
In INPGA, clustering-based sampling wastes time, but
it does not compute standard deviation of all niches, which
saves time. Then we analyze it in detail: the time
complexity of clustering is O(nkdg), where n is the size of
population, k is the number of classes, d is the
dimensionality, g is iterative time(while n is very large, k
d g can be regard as constant). In SDNPGA, the time
complexity of randomly sampling is O(n). Computing
standard deviation is adopted in the two methods, the
difference is that it does not compute standard deviation of
all niches in INPGA, the discrepancy is O(count*d*m),
where count is the number of niches which are not
computed standard deviation, d is dimensionality, m is the
number of samples in niche. While count is very small,
which can be regard as constant, otherwise, it is can not be
ignored. In a word, SDNPGA should be waste more time
than INPGA, the evaluation are in next subsection.
2) Experiments evaluation
Each algorithm all has 100 individuals in the population
and was run for 100 generations. The parameters values are
showed in table 1.We run each algorithm for five times
separately and record the execution time, the average is
filled in the tables. Table 8-9 show the execution time of
three methods from nursery and iris datasets.
REFERENCES
[1] J.H. Holland, Adaptation in Natural and Artificial Systems, Univ.
Michigan Press, Ann Arbor, MI, 1975.
[2] A.A. Freitas, On rule interestingness measures, Knowledge-Based
Systems 12 (1999) 309-315.
[3] C.M. Fonseca, P.J. Fleming, An overview of evolutionary algorithms in
multi-objective optimization, Evolutionary Computation 3 (1)(1995) 116.
[4] L. Davis, Handbook of Genetic Algorithms, Van Nostrand
Reinhold,New York, 1991.
[5] Z. Michalewicz, Genetic Algorithms + Data Structure = Evolution
Programs, Springer-Verlag, Berlin, 1994.
[6] S.Dehuri,R.Mall. Predictive and comprehensible rule discovery using a
multi-objective
genetic
algorithm.
Knowledge-Based
Systems
19(2006)413-421
[7] E. Ziztler, L. Thiele, Multi-objective evolutionary algorithms: a
comparative case study and strength Pareto approach, IEEE Transactions
on Evolutionary Computation 3 (1999) 257271.
[8]J.Horn, N.Nafpliotis, E.Goldberg, A niched Pareto genetic algorithm for
multi-objective optimization, in: Proceedings of the First IEEE Conference
on Evolutionary Computation, IEEE World Congress on Computational
Intelligence,vol.1,1994,pp.8287.
[9] J. Horn, N. Nafpliotis, E. Goldberg, A niched Pareto genetic algorithm
for multi-objective optimization, in: Proceedings of the First IEEE
Conference on Evolutionary Computation, IEEE World Congress on
Computational Intelligence, vol. 1, 1994, pp. 8287.
[10] Branke,J., K.Deb, K.Miettinen, R.Sowiski. Multi objective
optimization. Interactive and Evolutionary Approaches. Berlin, Heidelberg,
Springer Verlag, 2008.
[11] Deb, K. Multi-Objective Optimization Using Evolutionary Algorithms.
Wiley-Interscience Series in Systems and Optimization. Chichester, John
Wiley & Sons, 2001.
[12] D.E. Goldberg, J. Richardson, Genetic algorithms with sharing for
multi-modal function optimization, in: Proceedings of the 2nd International
Conference on Genetic Algorithm, 1987, pp. 4149.
[13] Dehuri, S, Patnaik, S., Ghosh, A., and Mall.R. 2008. Application of
elitist multi-objective genetic algorithm for classification rule generation.
Applied Soft Computing 8(1), 477-487.
[14] MacQueen, J. Some methods for classification and analysis of
multivariate observations. Proc: 5th Berkeley Symp. Math. Statist, Prob,
1:218297, 1967.
[15]E. Zhou and A. Khotanzad, Fuzzy classifier design using genetic
algorithm, Pattern Recognition 40 (2007), pp. 34013414.
[16] M. Zeleny, Multiple Criteria Decision Making, McGraw-Hill, New
York, 1982.
[17] C.L. Hwang, K. Yoon, Multiple Attribute Decision Making, Methods
and Application, A State of Art Survey, Springer-Verlag, New York, 1981.
[18] S. Bhattacharya, Evolutionary algorithms in data mining: multiobjective performance modeling for direct marketing, in: Proceedings of
the Sixth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD2000), ACM, 2000, pp. 465473.
Table 8 The execution time of three Table 9 The execution time of

methods from iris dataset
three methods from nursery dataset
Method
BNPGA
SDNPGA
INPGA
Execution
time(s)
6.5
11.452
6.75
Method
BNPGA
SDNPGA
INPGA
Execution
time(s)
280.3417
343.25
286.3953
We can see that our INPGA has an apparent superiority

over SDNPGA, and execution time is much or less the same
661

Multi-Objective Rule Discovery Using The Improved Niched Pareto Genetic Algorithm

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Multi-Objective Rule Discovery Using The Improved Niched Pareto Genetic Algorithm

Încărcat de

Drepturi de autor:

Formate disponibile

2011 Third International Conference on Measuring Technology and Mechatronics Automation

Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic

AbstractWe present an efficient genetic algorithm for mining

Keywords- Multi-objective rule; Niched Pareto genetic

A.. Genetic representations

* Supported by the Science Foundation of Yunnan Education

IF some conditions hold on the values of a set of

In other words, the value of a special attribute called the

Because we wanted more domination pressure, a

calculated for that individual. This degradation is obtained

examples that satisfy both A and the consequent C.

the distance between individuals i and j and Sh[d] is the

NICHED PARETO GENETIC ALGORITHM

degrade each others fitness, since they are in the same

A.. The basic niched Pareto GA(BNPGA)

Figure 1. Equivalence class sharing

Figure 1 illustrates how this form of sharing should

B. The improved niched Pareto genetic algorithm

subset (the dashed region) of the union of the comparison

Figure 2. The most standard deviation

If there is no sample in niche with radius

candidate_1 is selected. Otherwise, it is the same.

Figure 3 The situation of zero

Find out the center of gravity of both niche radius

The candidate having larger SD is chosen.

OUR PROPOSED NICHED PARETO GENETIC

The selection algorithm is as follows:

A.. Proposing Problem

When the candidates are either both dominated or

1 The clustering-based sampling obtains the comparison set S.

Delta is the threshold that evaluates the disparity of the

We mainly compared the execution time and rules by

A.. Description of the dataset

radius. Delta is only used in INPGA.

The dataset above is categorical, we also executed the

P, population size; Pc, probability of crossover; Pm,

Table 4 Rules generated by INPGA from nursery dataset

B. Rules discovered by BNPGA, SDNPGA and INPGA

Table 5 Rules generated by BNPGA from iris dataset

Table 6 Rules generated by SDNPGA from iris dataset

We omitted class P, VR and SP in table 3, omitted class

Table 3 Rules generated by SDNPGA from nursery dataset

as BNPGA, which is consistent with our discussion in

that the rules discovered by BNPGA have the lowest

In this article we have introduced SGA, BNPGA and

Table 7 Rules generated by INPGA from iris dataset

Table 8 The execution time of three Table 9 The execution time of

We can see that our INPGA has an apparent superiority

S-ar putea să vă placă și