Sunteți pe pagina 1din 5

2011 Third International Conference on Measuring Technology and Mechatronics Automation

Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic


Algorithm
Junli Lu*1, Fan Yang1, Momo Li1, Lizhen Wang2
1

Department of Mathematics and Computer Science, Yunnan University of nationalities, Kunming 650031, China
Department of Computer Science and Engineering, School of Information,Yunnan University,Kunming 650091,China
Ljl1982_3_6@126.com

AbstractWe present an efficient genetic algorithm for mining


multi-objective rules from large databases. Multi-objectives
will conflict with each other, which makes it optimization
problem that is very difficult to solve simultaneously. We
propose a multi-objective evolutionary algorithm called
improved niched Pareto genetic algorithm(INPGA), which not
only accurate selects the candidates but also saves selection
time with combining BNPGA and SDNPGA. Because the effect
of selection operator relies on the samples, we proposed
clustering-based sampling method, and we also consider the
situation of zero niche count. We have compared the execution
time and rules generation by INPGA with that by BNPGA and
SDNPGA. The experimental results confirm that our method
has edge over BNPGA and SDNPGA.

this difficulty, the literature [6] has proposed the multiobjective genetic algorithm called SDNPGA which is an
improved version of BNPGA for rule generation.
This paper is organized as follows. In section an
overview of simple genetic algorithm for classification rule
generation is provided. Section introduces the BNPGA
and SDNPGA and in section, we have discussed the
improved niched Pareto genetic algorithm(INPGA). The
implementation of our simulation experiments is discussed
in Section. Finally, Section concludes this paper.
AN OVERVIEW OF SIMPLE GENETIC ALGORITHM
In this section, we review the function of SGA. Genetic
algorithms are probabilistic search algorithms characterized
by the fact that a number N of potential solutions. The
population is modified according to the natural evolutionary
process: after initialization, selection and recombination are
executed in a loop until some termination criterion is
reached. Each run of the loop is called a generation.
The selection operator is intended to improve the
average quality of the population by giving individuals of
higher quality a higher probability to be copied into the next
generation. The quality of an individual is measured by a
fitness function. Recombination changes the genetic
material in the population either by crossover or by
mutation in order to obtain new points in the search space.

Keywords- Multi-objective rule; Niched Pareto genetic


algorithm; Data mining; Clustering; Zero niche count

INTRODUCTION
Classification rule mining is one of the most important
tasks in fuzzy logic system, 1 genetic algorithms (GAs) have
inspired many research efforts for optimization as well as
rule mining [1,3,15]. Traditional rule mining methods, are
usually accurate, but have brittle operations. Genetic
algorithms on the other hand provide a robust and efficient
approach to explore large search space. One of the GAs
called simple genetic algorithm (SGA) introduced by
J.H.Holland (1975) [1] and further extension can found in
[4,5] is good for rule generation satisfying a single objective.
However, practical rule generation is naturally posed as
multi-objective problems with two criteria: confidence
factor and comprehensibility [2,6]. A lot of multi-objective
GAs (MOGAs) [7,8] have been proposed. The simple GA
normally handles problems with such criteria by converting
them into a single objective problem. However, this
approach is unsatisfactory due to the nature of the
optimality conditions for multiple objectives. In the
presence of multiple and conflicting objectives, the resulting
optimization problem gives rise to a set of optimal solutions,
instead of just one optimal solution. Multiple optimal
solutions exist because no single solution can be a substitute
for multiple conflicting objectives. In order to overcome

A.. Genetic representations


Each individual in the population represents a candidate
rule R of the form if A then C. The antecedent of this
rule can be formed by a conjunction of n attributes. Each
condition or consequent is the fuzzy assignment for each of
the attribute. For instance, if attribute x1 has a fuzzy set
containing three partitions of {high, typical, low}
represented by numerical set {1, 2, 3}, then an assignment
of 2 in that cell represents If x1 is typical. Also, it is
possible to not use one of the feature elements in a rule.
This situation is handled by assigning a value of 0.
B. Rules form
We propose to use INPGA to discover high-level
prediction rules of the form:

* Supported by the Science Foundation of Yunnan Education


Committee under Grant No.08Y0264;the Youth Natural Science
Foundation of Yunnan University of nationalities under Grant
No.09QN10
978-0-7695-4296-6/11 $26.00 2011 IEEE
DOI 10.1109/ICMTMA.2011.449

IF some conditions hold on the values of a set of


predicting attributes
THEN predict a value for the goal attribute.

657

In other words, the value of a special attribute called the


goal attribute is predicted by the values given for other
attributes called the predicting attributes.

Because we wanted more domination pressure, a


sampling scheme is implemented. Two candidates for
selection are picked at random from the population. A
comparison set of individuals is also picked randomly from
the population. Each of the candidates is then compared
against each individual in the comparison set. If one
candidate is dominated by the comparison set, and the other
is not, the later is selected for reproduction. If neither or
both are dominated by the comparison set, then we must use
sharing to choose a winner, as we explained later.
We have referred to the domination, the domination
between two solutions is defined as follows (see [10,11]):
Definition 1. A solution x(1) is said to dominate the
other solution x(2), if the both following conditions are true:
The solution x(1) is not worse than x(2) with respect
to all the corresponding objectives.
The solution x(1) is strictly better than x(2) in at least
one objective.
2) Sharing on the non-dominated frontier
Fitness sharing was introduced by Goldberg and
Richardson[12], and calls for the degradation of an
individuals objective fitness f i by niche count mi

C. Fitness function
As discussed in SectioQ, the discovered rules should
have high confidence factor and high comprehensibility. In
this subsection, we discuss how these multiple criteria can
be incorporated into a single objective fitness function[6].
1) Comprehensibility metric
There are various ways to quantitatively measure rule
comprehensibility. The standard way of measuring
comprehensibility is to count the number of rules and the
number of conditions in these rules.
If a rule can have at most Ac conditions, the
comprehensibility Comp of a rule R can be defined as:
(1)
Comp ( R ) 1  ( Nc ( R ) / AC )
where N C (R ) is the number of conditions in the rule R.
2) Confidence factor
The antecedent part of the rule is a conjunction of
conditions. A very simple way to measure the confidence
factor of a rule Con (R ) is
(2)
Con( R ) SUP ( A  C ) SUP ( A)
where |A| is the number of examples satisfying all the
conditions in the antecedent A and A  C is the number of

calculated for that individual. This degradation is obtained


by simply dividing the objective fitness by the niche count
to find the shared fitness: f i / mi . The niche count mi is an
estimate of how crowded is the neighborhood (niche) of
individual i. It is calculated over all individuals in the
current population: mi
Sh[d [i, j ]] , where d[i,j] is

examples that satisfy both A and the consequent C.


The fitness function is computed as the arithmetic
weighted mean of comprehensibility and confidence factor.
Finally, the fitness function is given by:
w1 u Comp ( R )  w2 u Con ( R )
(3)
f ( x)
w1  w2
Where w1 and w2 are user-defined weights.

jPop

the distance between individuals i and j and Sh[d] is the


sharing function. Typically, the triangular sharing function
is used, where Sh[d ] 1  d / V share . Here, V share is the niche
radius. Individuals within V share distance of each other

NICHED PARETO GENETIC ALGORITHM

degrade each others fitness, since they are in the same


niche, but convergence of the full population is avoided.
When the candidate solutions are either both dominated
or both non-dominated, it is likely that they are in the same
equivalence class. The best fit candidate is determined to
be that candidate who has the least number of individuals in
its niche and thus the smallest niche count. We call this type
of sharing equivalence class sharing.

A.. The basic niched Pareto GA(BNPGA)


BNPGA, SDNPGA and our INPGA also studied on the
selection operator of genetic algorithm, the most widely
used selection technique is tournament selection. However,
tournament selection assumes that we want a single answer.
After several generations the population will converge to a
uniform one. To avoid convergence and maintain multiple
Pareto optimal solutions, the tournament selection is altered
in two ways. First, Pareto domination tournament is
introduced. Second, when a non-dominant tournament,
sharing is implemented to determine the winner[9].
1) Pareto domination tournaments
The binary relation of domination leads naturally to a
binary tournament in which two randomly selected
individuals are compared. But soon found that it produced
insufficient domination pressure. There were too many
dominated individuals in later generations. It seemed that a
sample size of two was too small to estimate an individuals
true domination ranking.

Figure 1. Equivalence class sharing

Figure 1 illustrates how this form of sharing should


work between two non-dominated individuals. Here, we are
maximizing along the x-axis and minimizing on the y-axis.
In this case, the two candidates are in the Pareto optimal

658

B. The improved niched Pareto genetic algorithm


For the problem 1, we can combine the two methods.
When the disparity of two niche counts is very large, the
equivalence class sharing is adopted, otherwise, the
standard deviation is adopted, which not only accurate
selects the candidates but also saves selection time.
For the problem 2, we suppose that the samples can
actually reflect the data distribution of original population.
Due to considering the diversity, we deal with it as follows:
For the two candidates candidate_1 and candidate_2,
candidate_1V share1 and candidate_ 2 V share2 :

subset (the dashed region) of the union of the comparison


set and the candidates. From a Pareto point of view, neither
candidate is preferred. But if we want to maintain useful
diversity, it is apparent to choose the candidate that has the
smaller niche count. In this case, it is candidate 2.
B. The standard deviation niched Pareto GA(SDNPGA)
When the candidates are either both dominated or both
non-dominated, the literature [6,13] also considers the
measure that can maintain useful diversity in the Pareto set.
The following approach can be suitable to achieve both the
goals, it is called standard deviation niched Pareto genetic
algorithm (SDNPGA).

Figure 2. The most standard deviation

If there is no sample in niche with radius

candidate_1 is selected. Otherwise, it is the same.


If there is no sample in the two niches, candidate_1 and
candidate_2 are both selected.
So we selected candidate 2 in figure 3. Note that the
premise that samples (comparison set) can actually reflect
the data distribution of original population, which is a
challenge to the sampling method in BNPGA and
SDPNGA-randomly sampling method. In this paper, we
propose a new sampling method-clustering-based sampling
method and we selected k-means clustering method[14].
The clustering-based method executes k-means
clustering to population at first, after obtaining several
clusters, sampling according to the size of clusters, the
isolated point will not be sampled. When selection operator
is executed, if the niche count is zero, which indicates the
candidate is in the area of isolated point, and not in any
cluster. Due to considering the diversity, we select the
candidate. It is the reason why we deal with the zero niche
count like that.

Figure 3 The situation of zero


niche count

Find out the center of gravity of both niche radius


( P1 and P 2 ) as:

P1

x
V

V share1 And P1

x
V

V share 2 
i share 2
Calculate the standard deviation of both niche radius.
V1
(x j  P2 )2 
( xi  P1 ) 2 And V 2
i

share1

xi share1

V share1 ,

x j share 2

 The candidate having larger SD is chosen.


Figure 2 illustrates how to maintain diversity in SNPGA
method, we chose the candidate 2.

Sampling algorithm:
1 Executing k-means clustering on population, obtain K clusters
2 Computing the number of samples in each cluster Sum(j).
j=1K
3 Sampling individuals according to the size of each cluster, the
number
of
samples
of
each
cluster
is
p(i).
p(i)=round(sum(i)/Nc*Tdom); i=1K. Where Nc is the population
size, Tdom is the total number of samples.

OUR PROPOSED NICHED PARETO GENETIC


ALGORITHM(INPGA)
Our proposed method origins from the problems in the
subsection as follows:

The selection algorithm is as follows:

A.. Proposing Problem

Selection algorithm:

When the candidates are either both dominated or


both non-dominated, in BNPGA the equivalence class
sharing is adopted. Computing the number of samples in
niches, and considering of the diversity, it would choose the
candidate has the smaller niche count. But when the two
niche counts are much or less the same, the efficiency of the
method is suspicious. SDNPGA adopts the standard
deviation, but when the disparity of two niche counts is very
large, it is unnecessary to compute the standard deviation,
which wastes the time. What method will solve the problem?
Both BNPGA and SDNPGA are efficient, but they
did not consider the situation that there is no sample in one
niche or both niches, we call it zero niche count. We
illustrate it in figure 3.

1 The clustering-based sampling obtains the comparison set S.


2 For any two candidates and comparison set Sif one candidate is
dominated and the other not, the candidate non-dominated is selected,
the algorithm is over.
3 If two candidates (candidate_1 and candidate_2) are either
dominated or non-dominated, computing the number of samples in
two niches, count1 and count2.
4 If count1=0, candidate_1 is selected and if count2=0, candidate_2 is
selected, the algorithm is over.
5 If count1-count2>delta or count2-count1>delta, then selects
candidate_2 or candidate_1, the algorithm is over.
6 If abs(count1-count2)<delta, computing the standard deviation of
two niches,sd1 and sd2.
7 If sd1>sd2, candidate_1 is selected, otherwise, candidate_2 is
selected. The algorithm is over.

659

Delta is the threshold that evaluates the disparity of the


two niche counts, and decides the equivalence class sharing
or standard deviation is adopted.

Class
#
NR

EXPERIMENTAL RESULTS

We mainly compared the execution time and rules by


BNPGA, SDNPGA and our INPGA. The experiments were
performed using the nursery dataset obtained from the UCI
machine repository. The dataset is categorical; we also
executed the experiments in seriate dataset (iris dataset).

A.. Description of the dataset


The nursery dataset has 12,960 records and nine
attributes, all of them categorical. The ninth attribute is
treated as class attribute. The iris dataset has 150 instances
and 5 attributes, the last one is the class attribute, and all of
them are seriate except the last one.

Pc

Pm

Tdom

V share

delta

Nursery
iris

500
200

0.75
0.8

0.002
0.01

50
15

20
7

7
4

Class
#
NR

R
SP

radius. Delta is only used in INPGA.


Tables 2-4 show the result generated by BNPGA,
SDNPGA and INPGA, respectively, from nursery dataset.
Table 2 Rules generated by BNPGA from nursery dataset

NR

VR

SP

Mined rules
If(parents=usual)^(housing=less_c
onv)^(social=problematic)Then(cla
ss=P)
If(parents=great_pret)^(social=slig
htly_prob)^(helth=recommended)
Then (class=P)
If(parents=usual)^(housing=less_c
onv)^(social=slightly_prob)^(helth
=not_recom)then(class=NR)
If(parents=pretentious)^(children=
3)^(housing=convenient)^(health=
not_recom) then (class=NR)
If (parents=gret_pret)^(children=2)
^(housing=critical)^(health=not_re
com) then (class=NR)
If(housing=less_conv)^(finance=in
conv)^(social=slightly_prob)^(helt
h=recommended) then (class=VR)
If(has_nurs=proper)^(finance=conv
enient)^(health=recommended)
then (class=R)
If(housing=convenient)^(finance=c
onvenient)^(children=2)^(social=sl
ightly_prob) then (class=SP)

Confiden
ce factor
0.7780

Comprehe
nsion
0.5

0.7867

0.625

0.79

0.751

0.75

Mined rules
If(parents=usual)^(housing=less_c
onv)^(helth=not_recom)then(class
=NR)
If (parents=pretentious)
^(housing=convenient)^(health=not
_recom) then (class=NR)
If (parents=gret_pret)
^(housing=critical)^(health=not_re
com) then (class=NR)
If(has_nurs=proper)^(finance=conv
enient)then (class=R)
If(housing=convenient)^(finance=c
onvenient)^(social=slightly_prob)
then (class=SP)

Confidenc
e factor
0.674

Comprehe
nsion
0.5

0.7867

0.625

0.79

0.751

0.75

0.812

0.625

The dataset above is categorical, we also executed the


experiments in seriate dataset-iris dataset. In order to obtain
the categorical data, we pre-process the dataset at first. Each
attribute is fuzzified, the number of fuzzy partitions for each
attribute is pre-determined. We fuzzy each attribute for 4
categories except the last one: More_sma, Small, Big,
More_big. The last one is treated as class attribute. So the
seriate data is changed to categorical. The rules discovered
by BNPGA, SDNPGA and INPGA from dataset are showed
in table 5-7.

P, population size; Pc, probability of crossover; Pm,


probability of mutation; Tdom, tournament size; V share , niche

Class
#
P

If(parents=usual)^(housing=less_c
onv)^(social=slightly_prob)^(helth
=not_recom)then(class=NR)
If (parents=pretentious)
^(housing=convenient)^(health=not
_recom) then (class=NR)
If (parents=gret_pret)
^(housing=critical)^(health=not_re
com) then (class=NR)
If(has_nurs=proper)^(finance=conv
enient)then (class=R)

Confidenc
e factor
0.634

Table 4 Rules generated by INPGA from nursery dataset

B. Rules discovered by BNPGA, SDNPGA and INPGA


The experiments have been performed using MATLAB
7.0 on Windows XP server. The data-specific parameters
and the parameters, which are encountered during the rule
discovery, are listed in Table 1.
Table 1 Parameters used for our the three methods
Dataset

Mined rules

Comprehen
sion
0.625

Table 5 Rules generated by BNPGA from iris dataset


0.8114

0.634

Class#

Mined rules

If(sl=more_sma)^(sw=big)^(pl=
more_sma)
then
(class=Irissetosa)
If(sl=more_small)^(sw=more_big)
^(pl=more_big) then (class=Irisversicolor)
If(sl=More_big)^(sw=small)
^(pl=more_small)then (class= Irisvirginica)

0.5

2
0.7641

3
0.783

0.897

0.5

0.71

0.625

0.79

0.5

Class#
1

Confidence
factor
0.7877

Comprehe
nsion
0.25

1.0

0.25

1.0

0.25

Table 6 Rules generated by SDNPGA from iris dataset


Mined rules
Confidence Comprehe
factor
nsion
If(sl=more_sma)^(sw=big)^(pl=
0.8
0.25
more_sma)
then
(class=Irissetosa)

We omitted class P, VR and SP in table 3, omitted class


P,VR in table 4 and omitted class 2,3 in table 6, omitted
class 2 in table 7, because they are the same as that of table
2 and table 5 separately. From Tables 2-7 it can be observed

Table 3 Rules generated by SDNPGA from nursery dataset

660

as BNPGA, which is consistent with our discussion in


section and last subsection.

that the rules discovered by BNPGA have the lowest


confidence factor and comprehension, and comparative
performance of INPGA has edge over SDNPGA and
BNPGA.

CONCLUSION

In this article we have introduced SGA, BNPGA and


SDNPGA for classification rule generation, we also
discussed the INPGA. Theory analysis and experiments
evaluate the efficiency of our INPGA. We are now
concentrating on careful selection of attributes in a
preprocessing step[16,17], in order to reduce the number of
attributes. Though there are few applications of INGA in
data mining tasks[18], for validating its robustness and
scalability more practical application to various domains of
data mining and more studies are needed.

Table 7 Rules generated by INPGA from iris dataset


Class#

Mined rules

If(sl=more_sma)^(sw=big)^(pl=
more_sma)
then
(class=Irissetosa)
If(sl=More_big)^(pl=more_small)
then (class= Iris-virginica)

Confidence
factor
0.82

Comprehe
nsion
0.25

1.0

0.5

C. Execution time
In this section, we mainly evaluate the time efficiency of
three methods. BNPGA should be the most fast because
randomly sampling is adopted and does not compute
standard deviation of niches(but from the previous section
we can see the rules discovered by BNPGA are not very
well compared with that of the other two methods). The
time efficiency of INPGA and SDNPGA are worth
analyzing.
1) Theory analysis
In INPGA, clustering-based sampling wastes time, but
it does not compute standard deviation of all niches, which
saves time. Then we analyze it in detail: the time
complexity of clustering is O(nkdg), where n is the size of
population, k is the number of classes, d is the
dimensionality, g is iterative time(while n is very large, k
d g can be regard as constant). In SDNPGA, the time
complexity of randomly sampling is O(n). Computing
standard deviation is adopted in the two methods, the
difference is that it does not compute standard deviation of
all niches in INPGA, the discrepancy is O(count*d*m),
where count is the number of niches which are not
computed standard deviation, d is dimensionality, m is the
number of samples in niche. While count is very small,
which can be regard as constant, otherwise, it is can not be
ignored. In a word, SDNPGA should be waste more time
than INPGA, the evaluation are in next subsection.
2) Experiments evaluation
Each algorithm all has 100 individuals in the population
and was run for 100 generations. The parameters values are
showed in table 1.We run each algorithm for five times
separately and record the execution time, the average is
filled in the tables. Table 8-9 show the execution time of
three methods from nursery and iris datasets.

REFERENCES
[1] J.H. Holland, Adaptation in Natural and Artificial Systems, Univ.
Michigan Press, Ann Arbor, MI, 1975.
[2] A.A. Freitas, On rule interestingness measures, Knowledge-Based
Systems 12 (1999) 309-315.
[3] C.M. Fonseca, P.J. Fleming, An overview of evolutionary algorithms in
multi-objective optimization, Evolutionary Computation 3 (1)(1995) 116.
[4] L. Davis, Handbook of Genetic Algorithms, Van Nostrand
Reinhold,New York, 1991.
[5] Z. Michalewicz, Genetic Algorithms + Data Structure = Evolution
Programs, Springer-Verlag, Berlin, 1994.
[6] S.Dehuri,R.Mall. Predictive and comprehensible rule discovery using a
multi-objective
genetic
algorithm.
Knowledge-Based
Systems
19(2006)413-421
[7] E. Ziztler, L. Thiele, Multi-objective evolutionary algorithms: a
comparative case study and strength Pareto approach, IEEE Transactions
on Evolutionary Computation 3 (1999) 257271.
[8]J.Horn, N.Nafpliotis, E.Goldberg, A niched Pareto genetic algorithm for
multi-objective optimization, in: Proceedings of the First IEEE Conference
on Evolutionary Computation, IEEE World Congress on Computational
Intelligence,vol.1,1994,pp.8287.
[9] J. Horn, N. Nafpliotis, E. Goldberg, A niched Pareto genetic algorithm
for multi-objective optimization, in: Proceedings of the First IEEE
Conference on Evolutionary Computation, IEEE World Congress on
Computational Intelligence, vol. 1, 1994, pp. 8287.
[10] Branke,J., K.Deb, K.Miettinen, R.Sowiski. Multi objective
optimization. Interactive and Evolutionary Approaches. Berlin, Heidelberg,
Springer Verlag, 2008.
[11] Deb, K. Multi-Objective Optimization Using Evolutionary Algorithms.
Wiley-Interscience Series in Systems and Optimization. Chichester, John
Wiley & Sons, 2001.
[12] D.E. Goldberg, J. Richardson, Genetic algorithms with sharing for
multi-modal function optimization, in: Proceedings of the 2nd International
Conference on Genetic Algorithm, 1987, pp. 4149.
[13] Dehuri, S, Patnaik, S., Ghosh, A., and Mall.R. 2008. Application of
elitist multi-objective genetic algorithm for classification rule generation.
Applied Soft Computing 8(1), 477-487.
[14] MacQueen, J. Some methods for classification and analysis of
multivariate observations. Proc: 5th Berkeley Symp. Math. Statist, Prob,
1:218297, 1967.
[15]E. Zhou and A. Khotanzad, Fuzzy classifier design using genetic
algorithm, Pattern Recognition 40 (2007), pp. 34013414.
[16] M. Zeleny, Multiple Criteria Decision Making, McGraw-Hill, New
York, 1982.
[17] C.L. Hwang, K. Yoon, Multiple Attribute Decision Making, Methods
and Application, A State of Art Survey, Springer-Verlag, New York, 1981.
[18] S. Bhattacharya, Evolutionary algorithms in data mining: multiobjective performance modeling for direct marketing, in: Proceedings of
the Sixth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD2000), ACM, 2000, pp. 465473.

Table 8 The execution time of three Table 9 The execution time of


methods from iris dataset
three methods from nursery dataset

Method
BNPGA
SDNPGA
INPGA

Execution
time(s)
6.5
11.452
6.75

Method
BNPGA
SDNPGA
INPGA

Execution
time(s)
280.3417
343.25
286.3953

We can see that our INPGA has an apparent superiority


over SDNPGA, and execution time is much or less the same

661

S-ar putea să vă placă și