Documente Academic
Documente Profesional
Documente Cultură
MEHMET R . TOLUN
Department of Computer Engineering, Eastern
Mediterranean University, Gazimagusa, Turkey
HAYR I SEVER
Department of Computer Science, Hacettepe University,
Ankara, Turkey
MAHMUT ULUD AG
Information Security Research Institute, Gebze, Kocaeli,
Turkey
SALEH M. ABU-SOUD
Department of Computer Science, Princess Sumaya
University College for Technology, Amman, Jordan
In this paper we describe the ILA-2 rule induction algorithm, which is the
improved ve rsion of a novel inductive learning algorithm s ILA .. We first
outline the basic algorithm ILA, and then present how the algorithm is
improved using a new evaluation metric that handles uncertainty in the
data. By using a new soft computing metric, users can reflect their preferences through a penalty factor to control the performance of the algorithm.
Inductive learning algorithm has also a faster pass criteria feature which
reduces the processing time without sacrificing much from the accuracy that
is not available in basic ILA.
Address correspondence to Mehmet R. Tolun, Department of Computer Engineering, Easte rn Mediterranean University, Gazimagusa, T.R.N.C. Turkey. E-mail: tolun.compenet.emu.edu.tr
Cybernetics and Systems: An International Journal, 30:609 ] 628, 1999
Copyright Q 1999 Taylor & Francis
0196-9722 99 $12.00 + .00
609
610
M. R. TOLUN ET AL.
We experimentally show that the performance of ILA-2 is comparable
to that of well-known inductive learning algorithms, namely, CN2, OC1,
ID3, and C4.5.
611
612
M. R. TOLUN ET AL.
known as incomple te data to point out the fact that some relevant
feature s are missing to extract nonconflicting class descriptions. In
re al-world problems, this often constitute s the gre ate st source of error,
because data are usually organized and collected around the needs of
organizational activitie s that cause s incomplete data from the knowledge discovery task point of view. Unde r such circumstance s, the
knowle dge discovery mode l should have the capability of providing
approximate decisions with some confidence level.
The second modification is a greedy rule generation bias that
reduces learning time at the cost of an increased numbe r of generate d
rules. This feature is discussed in the next section. ILA-2 is an extension
of ILA with respect to the modifications state d above . We have empirically compare d ILA-2 with ILA using re al-world data sets. The results
show that ILA-2 is better than ILA in terms of accuracy in classifying
unseen instance s, size of the classifiers, and learning time. Some wellknown inductive learning algorithms are compared to our own algorithm. These algorithms are ID3 s Quinlan, 1983 ., C4.5, C4.5rules
s Quinlan, 1993 ., OC1 s Murthy et al., 1994 ., and CN2 s Clark & Niblett,
1989., respectively. Test results with unse en example s also show that
ILA-2 is comparable to both CN2 and C4.5 algorithms.
The organization of this paper is as follows. In the following section,
we briefly introduce the ILA algorithm; the execution of the algorithm
for an example task is also presented. In the next section, modifications
to ILA algorithm are described, followed by a section on the time
complexity analysis of ILA-2. Finally, ILA-2 is empirically compare d
with five well-known induction algorithms ove r 19 different domains.
THE IND UC TIVE LEAR NING ALGOR ITHM
Inductive learning algorithm works in an iterative fashion. In each
iteration the algorithm strive s for se arching a rule that covers a large
numbe r of training examples of a single class. Having found a rule, ILA
first removes those examples from furthe r consideration by marking
them, and then appends the rule at the end of its rule set. In othe r
words, the algorithm works on a rules-pe r-class basis. For each class,
rules are induced to separate examples in that class from the examples
in the other classes. This produces an orde red list of rules rather than a
decision tree. The details of ILA algorithm are given in Figure 1. A
good description is a conjuncted pair of attributes and their values such
613
that it covers some positive examples and none of the negative examples
for a give n class. The goodness me asure asse sses the extent of goodness
by returning the good description with maximum occurre nces of positive
example s. Inductive learning algorithm constructs production rules in a
general-to-specific way, i.e., starting off with the most general rule
possible and producing specific rules whenever it is deemed necessary.
614
M. R. TOLUN ET AL.
Size
Color
Shape
Class
medium
small
small
large
large
large
large
blue
red
red
red
gre en
red
gre en
brick
wedge
sphere
wedge
pillar
pillar
sphere
yes
no
yes
no
yes
no
yes
615
Description
True Positive
False Negative
size s medium
color s blue
shape s brick
size s small
color s red
shape s sphere
size s large
color s green
shape s pillar
1
1
1
1
1
2
2
2
1
0
0
0
1
3
0
2
0
1
Description
True Positive
False Negative
size s medium
color s blue
shape s brick
size s large
color s green
shape s pillar
1
1
1
1
1
1
0
0
0
2
0
1
616
M. R. TOLUN ET AL.
Description
True Positive
False Negative
size s large
color s green
shape s pillar
1
1
1
2
0
1
This time only the second description satisfie s the ILA quality
criterion and is used for the generation of the following rule:
Rule 3: IF color s green class is yes.
Since example 4 is covere d by this rule, it is marke d as classified. All
example s of the current class s yes . are now marked as classified. The
algorithm continue s with the next class s no . generating the following
rules:
Rule 4: IF shape s wedge class is no.
Rule 5: IF color s red and size s large class is no.
The algorithm stops when all of the example s in the training set are
marked as classified, i.e., all the example s are cove red by the current
rule set.
EXTENSIONS TO ILA
Two main proble ms of ILA are overfitting and long learning time. The
overfitting proble m is due to the bias that ILA tries to generate a
consistent classifier on training data. However training data, most of the
time includes noisy examples causing overfitting in the generate d classifiers. We have developed a nove l heuristic function that preve nts this
bias in the case of noisy example s. We also propose d anothe r modification to make ILA faster, which considers the possibility of generating
more than one rule after the iteration steps.
The ILA algorithm and its exte nsions have been implemented by
using the source code of C4.5 programs. Therefore , in addition to
extensions state d above, the algorithm has also been enhanced by some
feature s of C4.5, such as rule sifting and default class selection s Quinlan, 1993 .. During the classification process, the ILA system first
617
extracts a set of rules using the ILA algorithm. In the next step, the set
of extracted rules for the classes are ordered to minimize false positive
errors and then a default class is chosen. The default class is found as
the one with most instance s not covere d by any rule. Ties are resolve d
in favor of more frequent classes. Once the class orde r and the default
class have been establishe d, the rule set is subject to a postpruning
process. If there are one or more rules whose omission would actually
reduce the number of classification errors in training cases, the first
such rule is discarded and the set is checked again. This last step allows
a final global scrutiny of the rule set as a whole for the conte xt it will be
used.
The ILA-2 algorithm can also handle continuous feature discretization through using the entropy-based algorithm of Fayyad and Irani
s 1993.. This algorithm uses a recursive entropy minimization heuristic
for discretization and couples this with a minimum description length
criterion s Rissane n, 1986. to control the numbe r of intervals produce d
over the continuous space . In the original pape r by Fayyad and Irani,
this method was applied locally at each node during tree generation.
The method was found to be quite promising as a global discretization
method s Ting, 1994 .. We have used the implementation of Fayyad and
Iranis discretization algorithm as provided within the MLC qq library
s Kohavi et al., 1996..
The Novel Evaluation Function
In general, an evaluation function s score for a description should
increase in proportion to both the numbe r of positive instances cove red,
denoted by TP, and the numbe r of negative instance s not cove red,
denoted by TN. In orde r to normalize the score, a simple metric take s
into account the total numbe r of positive instance s, P, and negative
instances, N, which is given in Langle y s 1996 . as
s TP q TN . r s P qN . ,
where the resulting value range s between 0 s when no positive s and all
negatives are cove red. and 1 s when all positives and no negative s are
cove red.. This ratio may be used to measure the ove rall classification
accuracy of a description on the training data.
618
M. R. TOLUN ET AL.
619
620
M. R. TOLUN ET AL.
Table 5. Results for splice data set with different values of penalty factors
Penalty
factor
1
2
3
4
5
7
10
30
ILA
Number of
rules
Average
number of
conditions
Total
number of
conditions
Accuracy on
training data
Accuracy on
test data
13
31
38
50
53
63
66
87
91
2.0
2.3
2.3
2.5
2.4
2.5
2.5
2.6
2.6
26
71
86
125
128
158
167
228
240
82.4%
88.7%
94.7%
96.1%
97.9%
98.7%
99.6%
100.0%
100.0%
73.4%
81.6%
87.6%
87.2%
86.9%
85.4%
88.5%
71.7%
67.9%
Figure 3. Accuracy values on training and test data for the splice data set.
621
generation. Usually the second approach tends to decrease the processing time. On the othe r hand, this approach might result in redundant
rules with an increase in the size of the output rule set.
The above ide a was implemented in the ILA syste m and the option
to activate this fe ature referred to as FastILA. For example, in case of
promote r data set, FastILA reduced the processing time from 17
seconds to 10 seconds. Also, the numbe r of final rules decreased by one
and the total numbe r of conditions by two. The expe riments show that if
the size of the classification rules is not extre mely important and less
processing time is desirable , then FastILA option would be more
suitable to use.
As seen in Table 6, ILA-2 s or ILA with fast pass criteria. generates
a higher number of initial rules than ILA, which amounts to about 5.5
on ave rage for the evaluation data sets. This is because the faste r pass
criteria permits more than one rule to be asserted at once. However,
afte r the rule generation step is finishe d, the sifting process sifts all the
unnecessary rules.
THE TIME C OMPLEXITY ANALYSIS
In the time complexity analysis of ILA, only the uppe r bounds for the
critical compone nts of the algorithm are provided because the overall
Table 6. Effect of fast-ILA option in terms of four different parame ters
Training
set
Lenses
Monk1
Monk2
Monk3
Mushroom
Parity5 q5
Tic-tac-toe
Vote
Zoo
Splice
Coding
Promoter
Totals
Number of
initial rules
Number of
final rules
ILA
ILA-2
ILA
ILA-2
6
23
61
23
24
35
26
33
9
93
141
14
488
9
30
101
37
55
78
31
185
55
825
1073
226
2703
6
21
48
23
16
15
26
27
9
91
108
14
404
5
22
48
23
15
14
26
22
9
76
112
13
383
Estimated
accuracy s % .
Time
s second.
ILA
ILA-2
ILA
ILA-2
50
100
78.5
88.2
100
50.8
98.1
96.3
91.2
67.9
68.7
100
85.25
62.5
94.4
81.3
86.3
100
50.8
98.1
94.8
85.3
97.7
100
100
90.58
1
1
3
1
1476
9
90
31
1
1569
1037
17
4236
1
1
2
1
1105
5
52
25
0
745
345
10
2290
622
M. R. TOLUN ET AL.
r r O s e . S.
r r O s s e r c . .Na .
623
Evaluation of descriptions ; r r O s s .
Marking covere d instance s ; r r O s e r c .
4
4
Therefore , the ove rall time complexity is then give n by
O s c. Na .s e r c.Na q e.S qS q e r c ..
O s e.Na2 q c. Na .S.s e q1 . q e.Na .
This may be simplified by replacing e q 1 with e, and then the complexity becomes
O s e.Na2 q c. Na .e.S q e.Na .
O s e. Na . s Na q c.S q1 r c . . .
As 1 r c is also comparative ly small, then
O s e. Na . s Na q c.S .. .
Usually c.S is much larger than Na , e.g., in the experiments we
selected S as 1500 while the maximum Na was only 60. In addition, c is
also comparative ly smaller than S. Therefore, we may simplify the
complexity equation as
O s e. Na .S . .
So, the time complexity of the ILA is line ar in the numbe r of attributes
and the numbe r of examples. Also, the size of the hypothesis space s S.
linearly effects the proce ssing time.
EVALUA TION OF ILA-2
For evaluation purposes of ILA-2, we have mainly used two paramete rs
} the classifier size and accuracy. The classifier size is the total numbe r
of conditions of the rules in the classifier. For decision-tre e algorithms
624
M. R. TOLUN ET AL.
classifier size refers to the numbe r of leaf node s in the decision tree,
i.e., the numbe r of regions that the data are divide d into by the tree.
Accuracy is the estimated accuracy on test data. We have used the
hold-out method to estimate the future prediction accuracy on unse en
data.
We have used 19 diffe rent training sets from the UCI repository
s Merz & Murphy, 1997.. Table 7 summarize s the characte ristics of these
data sets. In orde r to test the algorithms for the ability of classifying
unseen example s a simple practice is to reserve a portion of the training
data set as a separate test set that is not used in building the classifiers.
We have employe d the test sets related with these training sets from the
UCI repository in the expe riments to estimate the accuracy of the
classifiers.
In selecting the PF as 1 and 5, we have conside red the two ends and
the middle of the PF spectrum. In the highe r end we observe that the
ILA-2 performs like the basic ILA. For this reason we have not
Table 7. The characteristics features of tested data sets
Domain
name
Lenses
Monk1
Monk2
Monk3
Mushroom
Parity5 q5
Tic-tac-toe
Vote
Zoo
Splice
Coding
Promoter
Australia
Crx
Breast
Cleve
Diabetes
Heart
Iris
Number of
attributes
Number of
examples in
training data
Number of
class value s
Number of
examples in
test data
4
6
6
6
22
10
9
16
16
60
15
57
14
15
10
13
8
13
10
16
124
169
122
5416
100
638
300
67
700
600
106
460
465
466
202
512
180
100
3
2
2
2
2
2
2
2
7
3
2
2
2
2
2
2
2
2
3
8
432
432
432
2708
1024
320
135
34
3170
5000
40
230
187
233
99
256
90
50
625
ILA-2
PF s 1
ILA-2
PF s 5
ILA
ID3
C4.5pruned
OC1
C4.5rules
CN2
62.5
100
59.7
100
98.2
50.0
84.1
97.0
88.2
73.4
70.0
97.5
83.0
80.2
95.7
70.3
71.5
60.0
96.0
50
100
66.7
87.7
100
51.1
98.1
96.3
91.2
86.9
70.7
97.5
76.5
78.1
96.1
76.2
73.8
82.2
94.0
50
100
78.5
88.2
100
51.2
98.1
94.8
91.2
67.9
68.7
100
82.6
75.4
95.3
76.2
65.6
84.4
96.0
62.5
81.0
69.9
91.7
100
50.8
80.9
94.1
97.1
89.0
65.7
100
81.3
72.5
94.4
64.4
62.5
75.6
94.0
62.5
75.7
65.0
97.2
100
50.0
82.2
97.0
85.3
90.4
63.2
95.0
87.0
83.0
95.7
77.2
69.1
83.3
92.0
37.5
91.2
96.3
94.2
99.9
52.4
85.6
96.3
73.5
91.2
65.9
87.5
84.8
78.5
95.7
79.2
70.3
78.9
96.0
62.5
93.5
66.2
96.3
99.7
50.0
98.1
95.6
85.3
92.7
64.0
97.5
88.3
84.5
94.4
82.2
73.4
84.4
92.0
62.5
98.6
75.4
90.7
100
53.0
98.4
95.6
82.4
84.5
100
100
82.2
80.0
97.0
68.3
70.7
77.8
94.0
626
M. R. TOLUN ET AL.
test sets and they have similar accuracies in the other two domains.
Inductive learning algorithms performe d marginally better than C4.5
among the 19 test sets, producing highe r accuracies in 10 domains test
sets.
Table 9 shows the size of the output classifiers generate d by the
same algorithms above for the same data sets. Results in the table prove
that ILA-2 or ILA with the nove l evaluation function is comparable to
C4.5 algorithms in terms of the generated classifiers size. When the
penalty factor is set to 1, ILA-2 usually produce d the smallest size
classifiers for the evaluation sets.
In regard to results in Table 9, it may be worth pointing out that
ILA-2 solves the overfitting proble m of basic s certain . ILA, in a similar
fashion to C4.5 which solve s the overfitting proble m of ID3 s Quinlan,
1986.. The sizes of classifiers generate d by the corresponding classification methods show the existence of this relationship clearly.
Domain name
Lenses
Monk1
Monk2
Monk3
Mushroom
Parity5 q5
Tic-tac-toe
Vote
Zoo
Splice
Coding
Promoter
Australia
Crx
Breast
Cleve
Diabetes
Heart
Iris
Totals
ILA-2,
PF s 1
ILA-2,
PF s 5
ILA
ID3
C4.5pruned
C4.5rules
9
14
9
5
9
2
15
18
11
26
64
7
13
12
5
8
6
1
5
238
13
37
115
48
13
67
88
35
16
128
256
18
69
39
16
55
22
25
16
1079
13
58
188
63
22
81
88
69
17
240
319
27
116
111
38
64
33
19
4
1570
9
92
176
42
29
107
304
67
21
201
429
41
130
129
37
74
165
57
9
2119
7
18
31
12
30
23
85
7
19
81
109
25
30
58
19
27
27
33
7
648
8
23
35
25
11
17
66
8
14
88
68
12
30
32
20
20
19
26
5
527
627
C ONC LUSION
We introduce d an exte nded version of ILA, namely ILA-2, which is a
supervised rule induction algorithm. ILA-2 has additional fe ature s that
are not available in basic ILA. A faster pass criterion that reduces the
processing time by employing a greedy rule generation strate gy is
introduced. This fe ature, called fastILA, is useful for situations where a
reduced processing time is more important than the size of the classification task performed.
The main contribution of our work is the evaluation metric utilized
in ILA-2 for evaluation of description s s.. In othe r words, users can
reflect their preferences via a PF to tune up s or control . the performance of the ILA-2 system with respect to the nature of the domain at
hand. This provides a valuable advantage over most of the current ILA.
Finally, using a number of machine learning and re al-world data
sets, we show that the performance of ILA-2 is comparable to that of
well-known inductive learning algorithms, CN2, OC1, ID3, and C4.5.
As a further work, adapting diffe rent feature subse t selection s FSS.
approaches is planned to be embedded into the syste m as a preprocessing step in orde r to yield a bette r performance. With FSS, the se arch
space requirements and the processing time will probably be reduced
due to the elimination of irrelevant attribute value combinations at the
ve ry beginning of the rule extraction process.
AC KNOWLED GMENTS
This rese arch is partly supporte d by the State Planning Organization of
the Turkish Republic unde r the rese arch grant 97-K-12330. All of the
data sets were obtained from the Unive rsity of California-Irvine s repository of machine learning databases and domain theories, manage d by
Patrick M. Murphy. We acknowledge Ross Quinlan and Peter Clark for
the implementations of C4.5 and CN2 and also Ron Kohavi, for we have
used MLC qqlibrary to execute OC1, CN2, and ID3 algorithms.
R EFER ENC ES
Clark, P., and T. Niblett. 1989. The CN2 induction algorithm. Machine Learning
3: 261 ] 283.
628
M. R. TOLUN ET AL.