Documente Academic
Documente Profesional
Documente Cultură
Department of Mathematics and Computer Science, Yunnan University of nationalities, Kunming 650031, China
Department of Computer Science and Engineering, School of Information,Yunnan University,Kunming 650091,China
Ljl1982_3_6@126.com
this difficulty, the literature [6] has proposed the multiobjective genetic algorithm called SDNPGA which is an
improved version of BNPGA for rule generation.
This paper is organized as follows. In section an
overview of simple genetic algorithm for classification rule
generation is provided. Section introduces the BNPGA
and SDNPGA and in section, we have discussed the
improved niched Pareto genetic algorithm(INPGA). The
implementation of our simulation experiments is discussed
in Section. Finally, Section concludes this paper.
AN OVERVIEW OF SIMPLE GENETIC ALGORITHM
In this section, we review the function of SGA. Genetic
algorithms are probabilistic search algorithms characterized
by the fact that a number N of potential solutions. The
population is modified according to the natural evolutionary
process: after initialization, selection and recombination are
executed in a loop until some termination criterion is
reached. Each run of the loop is called a generation.
The selection operator is intended to improve the
average quality of the population by giving individuals of
higher quality a higher probability to be copied into the next
generation. The quality of an individual is measured by a
fitness function. Recombination changes the genetic
material in the population either by crossover or by
mutation in order to obtain new points in the search space.
INTRODUCTION
Classification rule mining is one of the most important
tasks in fuzzy logic system, 1 genetic algorithms (GAs) have
inspired many research efforts for optimization as well as
rule mining [1,3,15]. Traditional rule mining methods, are
usually accurate, but have brittle operations. Genetic
algorithms on the other hand provide a robust and efficient
approach to explore large search space. One of the GAs
called simple genetic algorithm (SGA) introduced by
J.H.Holland (1975) [1] and further extension can found in
[4,5] is good for rule generation satisfying a single objective.
However, practical rule generation is naturally posed as
multi-objective problems with two criteria: confidence
factor and comprehensibility [2,6]. A lot of multi-objective
GAs (MOGAs) [7,8] have been proposed. The simple GA
normally handles problems with such criteria by converting
them into a single objective problem. However, this
approach is unsatisfactory due to the nature of the
optimality conditions for multiple objectives. In the
presence of multiple and conflicting objectives, the resulting
optimization problem gives rise to a set of optimal solutions,
instead of just one optimal solution. Multiple optimal
solutions exist because no single solution can be a substitute
for multiple conflicting objectives. In order to overcome
657
C. Fitness function
As discussed in SectioQ, the discovered rules should
have high confidence factor and high comprehensibility. In
this subsection, we discuss how these multiple criteria can
be incorporated into a single objective fitness function[6].
1) Comprehensibility metric
There are various ways to quantitatively measure rule
comprehensibility. The standard way of measuring
comprehensibility is to count the number of rules and the
number of conditions in these rules.
If a rule can have at most Ac conditions, the
comprehensibility Comp of a rule R can be defined as:
(1)
Comp ( R ) 1 ( Nc ( R ) / AC )
where N C (R ) is the number of conditions in the rule R.
2) Confidence factor
The antecedent part of the rule is a conjunction of
conditions. A very simple way to measure the confidence
factor of a rule Con (R ) is
(2)
Con( R ) SUP ( A C ) SUP ( A)
where |A| is the number of examples satisfying all the
conditions in the antecedent A and A C is the number of
jPop
658
P1
x
V
V share1 And P1
x
V
V share 2
i share 2
Calculate the standard deviation of both niche radius.
V1
(x j P2 )2
( xi P1 ) 2 And V 2
i
share1
xi share1
V share1 ,
x j share 2
Sampling algorithm:
1 Executing k-means clustering on population, obtain K clusters
2 Computing the number of samples in each cluster Sum(j).
j=1K
3 Sampling individuals according to the size of each cluster, the
number
of
samples
of
each
cluster
is
p(i).
p(i)=round(sum(i)/Nc*Tdom); i=1K. Where Nc is the population
size, Tdom is the total number of samples.
Selection algorithm:
659
Class
#
NR
EXPERIMENTAL RESULTS
Pc
Pm
Tdom
V share
delta
Nursery
iris
500
200
0.75
0.8
0.002
0.01
50
15
20
7
7
4
Class
#
NR
R
SP
NR
VR
SP
Mined rules
If(parents=usual)^(housing=less_c
onv)^(social=problematic)Then(cla
ss=P)
If(parents=great_pret)^(social=slig
htly_prob)^(helth=recommended)
Then (class=P)
If(parents=usual)^(housing=less_c
onv)^(social=slightly_prob)^(helth
=not_recom)then(class=NR)
If(parents=pretentious)^(children=
3)^(housing=convenient)^(health=
not_recom) then (class=NR)
If (parents=gret_pret)^(children=2)
^(housing=critical)^(health=not_re
com) then (class=NR)
If(housing=less_conv)^(finance=in
conv)^(social=slightly_prob)^(helt
h=recommended) then (class=VR)
If(has_nurs=proper)^(finance=conv
enient)^(health=recommended)
then (class=R)
If(housing=convenient)^(finance=c
onvenient)^(children=2)^(social=sl
ightly_prob) then (class=SP)
Confiden
ce factor
0.7780
Comprehe
nsion
0.5
0.7867
0.625
0.79
0.751
0.75
Mined rules
If(parents=usual)^(housing=less_c
onv)^(helth=not_recom)then(class
=NR)
If (parents=pretentious)
^(housing=convenient)^(health=not
_recom) then (class=NR)
If (parents=gret_pret)
^(housing=critical)^(health=not_re
com) then (class=NR)
If(has_nurs=proper)^(finance=conv
enient)then (class=R)
If(housing=convenient)^(finance=c
onvenient)^(social=slightly_prob)
then (class=SP)
Confidenc
e factor
0.674
Comprehe
nsion
0.5
0.7867
0.625
0.79
0.751
0.75
0.812
0.625
Class
#
P
If(parents=usual)^(housing=less_c
onv)^(social=slightly_prob)^(helth
=not_recom)then(class=NR)
If (parents=pretentious)
^(housing=convenient)^(health=not
_recom) then (class=NR)
If (parents=gret_pret)
^(housing=critical)^(health=not_re
com) then (class=NR)
If(has_nurs=proper)^(finance=conv
enient)then (class=R)
Confidenc
e factor
0.634
Mined rules
Comprehen
sion
0.625
0.634
Class#
Mined rules
If(sl=more_sma)^(sw=big)^(pl=
more_sma)
then
(class=Irissetosa)
If(sl=more_small)^(sw=more_big)
^(pl=more_big) then (class=Irisversicolor)
If(sl=More_big)^(sw=small)
^(pl=more_small)then (class= Irisvirginica)
0.5
2
0.7641
3
0.783
0.897
0.5
0.71
0.625
0.79
0.5
Class#
1
Confidence
factor
0.7877
Comprehe
nsion
0.25
1.0
0.25
1.0
0.25
660
CONCLUSION
Mined rules
If(sl=more_sma)^(sw=big)^(pl=
more_sma)
then
(class=Irissetosa)
If(sl=More_big)^(pl=more_small)
then (class= Iris-virginica)
Confidence
factor
0.82
Comprehe
nsion
0.25
1.0
0.5
C. Execution time
In this section, we mainly evaluate the time efficiency of
three methods. BNPGA should be the most fast because
randomly sampling is adopted and does not compute
standard deviation of niches(but from the previous section
we can see the rules discovered by BNPGA are not very
well compared with that of the other two methods). The
time efficiency of INPGA and SDNPGA are worth
analyzing.
1) Theory analysis
In INPGA, clustering-based sampling wastes time, but
it does not compute standard deviation of all niches, which
saves time. Then we analyze it in detail: the time
complexity of clustering is O(nkdg), where n is the size of
population, k is the number of classes, d is the
dimensionality, g is iterative time(while n is very large, k
d g can be regard as constant). In SDNPGA, the time
complexity of randomly sampling is O(n). Computing
standard deviation is adopted in the two methods, the
difference is that it does not compute standard deviation of
all niches in INPGA, the discrepancy is O(count*d*m),
where count is the number of niches which are not
computed standard deviation, d is dimensionality, m is the
number of samples in niche. While count is very small,
which can be regard as constant, otherwise, it is can not be
ignored. In a word, SDNPGA should be waste more time
than INPGA, the evaluation are in next subsection.
2) Experiments evaluation
Each algorithm all has 100 individuals in the population
and was run for 100 generations. The parameters values are
showed in table 1.We run each algorithm for five times
separately and record the execution time, the average is
filled in the tables. Table 8-9 show the execution time of
three methods from nursery and iris datasets.
REFERENCES
[1] J.H. Holland, Adaptation in Natural and Artificial Systems, Univ.
Michigan Press, Ann Arbor, MI, 1975.
[2] A.A. Freitas, On rule interestingness measures, Knowledge-Based
Systems 12 (1999) 309-315.
[3] C.M. Fonseca, P.J. Fleming, An overview of evolutionary algorithms in
multi-objective optimization, Evolutionary Computation 3 (1)(1995) 116.
[4] L. Davis, Handbook of Genetic Algorithms, Van Nostrand
Reinhold,New York, 1991.
[5] Z. Michalewicz, Genetic Algorithms + Data Structure = Evolution
Programs, Springer-Verlag, Berlin, 1994.
[6] S.Dehuri,R.Mall. Predictive and comprehensible rule discovery using a
multi-objective
genetic
algorithm.
Knowledge-Based
Systems
19(2006)413-421
[7] E. Ziztler, L. Thiele, Multi-objective evolutionary algorithms: a
comparative case study and strength Pareto approach, IEEE Transactions
on Evolutionary Computation 3 (1999) 257271.
[8]J.Horn, N.Nafpliotis, E.Goldberg, A niched Pareto genetic algorithm for
multi-objective optimization, in: Proceedings of the First IEEE Conference
on Evolutionary Computation, IEEE World Congress on Computational
Intelligence,vol.1,1994,pp.8287.
[9] J. Horn, N. Nafpliotis, E. Goldberg, A niched Pareto genetic algorithm
for multi-objective optimization, in: Proceedings of the First IEEE
Conference on Evolutionary Computation, IEEE World Congress on
Computational Intelligence, vol. 1, 1994, pp. 8287.
[10] Branke,J., K.Deb, K.Miettinen, R.Sowiski. Multi objective
optimization. Interactive and Evolutionary Approaches. Berlin, Heidelberg,
Springer Verlag, 2008.
[11] Deb, K. Multi-Objective Optimization Using Evolutionary Algorithms.
Wiley-Interscience Series in Systems and Optimization. Chichester, John
Wiley & Sons, 2001.
[12] D.E. Goldberg, J. Richardson, Genetic algorithms with sharing for
multi-modal function optimization, in: Proceedings of the 2nd International
Conference on Genetic Algorithm, 1987, pp. 4149.
[13] Dehuri, S, Patnaik, S., Ghosh, A., and Mall.R. 2008. Application of
elitist multi-objective genetic algorithm for classification rule generation.
Applied Soft Computing 8(1), 477-487.
[14] MacQueen, J. Some methods for classification and analysis of
multivariate observations. Proc: 5th Berkeley Symp. Math. Statist, Prob,
1:218297, 1967.
[15]E. Zhou and A. Khotanzad, Fuzzy classifier design using genetic
algorithm, Pattern Recognition 40 (2007), pp. 34013414.
[16] M. Zeleny, Multiple Criteria Decision Making, McGraw-Hill, New
York, 1982.
[17] C.L. Hwang, K. Yoon, Multiple Attribute Decision Making, Methods
and Application, A State of Art Survey, Springer-Verlag, New York, 1981.
[18] S. Bhattacharya, Evolutionary algorithms in data mining: multiobjective performance modeling for direct marketing, in: Proceedings of
the Sixth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD2000), ACM, 2000, pp. 465473.
Method
BNPGA
SDNPGA
INPGA
Execution
time(s)
6.5
11.452
6.75
Method
BNPGA
SDNPGA
INPGA
Execution
time(s)
280.3417
343.25
286.3953
661