Sunteți pe pagina 1din 5

Ranking Discovered Rules from Data Mining

by MADM Method


Peyman Gholami*, Mohamad Sepehri Rad, Azade Bazle, Nezam Faghih
Abstract: Data mining techniques, extracting patterns from large databases have become widespread in business. Using these
techniques, various rules may be obtained and only a small number of these rules may be selected for implementation due, at
least in part, to limitations of budget and resources. Evaluating and ranking the interestingness or usefulness of association
rules is important in data mining. This paper proposes entropy, an applicable and useful weighting technique of multiple attribute
decision making (MADM) method. Then, utilizing this model, a popular method, technique for order preference by similarity to
ideal solution (TOPSIS) method, for prioritizing association rules by considering multiple criteria is proposed. As an advantage,
the proposed method is computationally more efficient than previous works. Using an example of market basket analysis,
applicability of our method for measuring the ranking of association rules with multiple criteria is illustrated.
Index Terms component ; data mining; association rule; MADM; entropy; ranking; TOPSIS.



1 INTRODUCTION
With the rapid growth of databases in many modern
enterprises data mining has become an increasingly im-
portant approach for data analysis. In recent years, the
field of data mining has seen an explosion of interest from
both academia and industry (Olafson,Li, & Wu, 2008).
Increasing volume of data, increasing awareness of in-
adequacy of human brain to process data and increasing
affordability of machine learning are reasons of growing
popularity of data mining (Marakas, 2004).
Data mining (DM) is the process for automatic discov-
ery of high level knowledge by obtaining information
from real data. Discovering association rules is one of the
several DM techniques described in the literature [1].
Association rules are used to represent and identify
dependencies between items in a database [2]. These are
an expression of the type X Y, where X and Y are sets
of items and X Y = . It means that if all the items in X
exist in a transaction then all the items in Y are also in the
transaction with a high probability, and X and Y should
not have a common item [3,4]. Many previous studies
focused on databases with binary values; however, the
data in real-world applications usually consist of quantit-
ative values. Designing DM algorithms, able to deal with
various types of data, presents a challenge to workers in
this research field. One of the main objectives of data min-
ing is to produce interesting rules with respect to some
users point of view. This user is not assumed to be a data
mining expert, but rather an expert in the field being
mined (Lenca, Meyer, Vaillant, & Lallich, 2008).
The problem of discovering association rules has re-
ceived considerable research attention and several fast
algorithms for mining association rules have been deve
oped (Srikant, Vu, & Agrawal, 1997). Using these tech-
niques, various rules may be obtained and only a small
number of these rules may be selected for implementation
due, at least in part, to limitations of budget and re-
sources (Chen, 2007). According to Liu, Hsu, Chen, and
Ma (2000) the interestingness issue has long been identi-
fied as an important problem in data mining. It refers to
finding rules that are interesting/ useful to the user, not
just any possible rule. Indeed, there exist some situations
that make necessary the prioritization of rules for select-
ing and concentrating on more valuable rules due to the
number of qualified rules (Tan & Kumar, 2000) and li-
mited business resources (Choi, Ahn, & Kim, 2005). Ac-
cording to Chen (2007), selecting the more valuable rules
for implementation increases the possibility of success in
data mining.
For example, in market basket analysis, understand-
ing which products are usually bought together by cus-
tomers and how the cross-selling promotions are benefi-
cial to sellers both attract marketing analysts. The former
makes sellers to provide appropriate products by consi-
dering the customers preferences, and the later allows
sellers to gain increased profits by considering the sellers
profits. Customers preferences can be measured based on
support and confidence in association rules. On the other
hand, seller profits can be assessed using domain related
measures such as sale profit and cross-selling profit asso-
ciated with the association rules (Chen, 2007).

*Department of Industrial Engineering, Arak Branch, Islamic Azad Uni-
versity, Arak, Iran.
Department of Industrial Engineering, Arak Branch, Islamic Azad Univer-
sity, Arak, Iran
Department of Industrial Engineering, Arak Branch, Islamic Azad Univer-
sity, Arak, Iran
Department of Industrial Management, Shiraz University, Shiraz, Iran

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 64

In previous studies dealing with the discovery of sub-
jectively interesting association rules, most approaches
require manual input or interaction by asking users to
explicitly distinguish between interesting and uninterest-
ing rules (Chen, 2007). Srikant et al. (1997) presented
three integrated algorithms for mining association rules
with item constraint. Moreover, Lakshmanan et al. (1998)
extended the approach presented by Srikant et al. to con-
sider much more complicated constraints, including do-
main, class, and SQL-style aggregate constraints. Liu et al.
(2000) presents an Interestingness Analysis System (IAS)
to help the user identify interesting association rules. In
their proposed method, they consider two main subjec-
tive interestingness measures, unexpectedness and ac-
tion-ability. Choi et al. (2005), using analytic hierarchy
process (AHP) presented a method for association rules
prioritization which considers the business values which
are comprised of objective metric or managers subjective
judgments. They believed that proposed method makes
synergy with decision analysis techniques for solving
problems in the domain of data mining. Nevertheless this
method requires large number of human interaction to
obtain weights of criteria by aggregating the opinions of
various managers.
Multiple attribute decision making (MADM) is de-
voted to tackling the problem of choosing the most desir-
able alternative that has the highest degree of satisfaction
from a set of alternatives in regard to their attributes [5-
11]. Since MADM has found great acceptance in areas of
operational research, economics and management
science, the discipline has been one of the fastest growing
areas during the last several decades. Especially in the
recent years, with the significant increasing of computer
usage, the application of MADM methods has considera-
bly become easier for the users as the application of most
of the methods are corresponded with complex mathe-
matics.

2. Association rule
Association rule mining, introduced by Agrawal, Imie-
linski, and Swami (1993), has been widely used from tra-
ditional business applications such as cross-marketing,
attached mailing, catalog design, loss-leader analysis,
store layout, and customer segmentation to e-business
applications such as the renewal of web pages and web
personalization (Choi et al., 2005).
Given a set of transactions, where each transaction is a
set of literals (called items), an association rule is an ex-
pression of the form X Y, where X and Y are sets of
items. The intuitive meaning of such a rule is that transac-
tions of the database which contains X to contain Y. An
example of an association rule is: 40% of transactions
that contain bread also contain milk; 3% of all transac-
tions contain both these items. Here 40% is called the
condence of the rule, and 3% the support of the rule. It
should be noted that associations may include any num-
ber of items on either side of the rule. An efcient algo-
rithm is required that restricts the search space and
checks only a subset of all association rules, yet does not
miss important rules (Chen, 2007).
Many algorithms can be used to discover association
rules from data to extract useful patterns. Apriori algo-
rithm is one of the most widely used and famous tech-
niques for nding association rules (Agrawal & Srikant,
1994; Agrawal et al., 1993). Apriori operates in two phas-
es. In the rst phase, all item sets with minimum support
(frequent item sets) are generated. This phase utilizes the
downward closure property of support. In other words, if
an item set of size k is a frequent item set, then all the
item sets below (k - 1) size must also be frequent item
sets. Using this property, candidate item sets of size k are
generated from the set of frequent item sets of size (k - 1)
by imposing the constraint that all subsets of size (k - 1) of
any candidate item set must be present in the set of fre-
quent item sets of size (k - 1). The second phase of the
algorithm generates rules from the set of all frequent item
sets.
Association rule mining is a popular technique for
market basket analysis, which typically aims at nding
buying patterns for supermarket, mail-order and other
customers. By mining association rules, marketing ana-
lysts try to nd sets of products that are frequently
bought together, so that certain other items can be in-
ferred from a shopping cart containing particular items.
Association rules can often be used to design marketing
promotions, for example, by appropriately arranging
products on a supermarket shelf and by directly suggest-
ing to customers items that may be of interest (Chen,
2007).
3. Entropy Weighting Method
Shannon and Weaver (1947) proposed the entropy
concept and this concept has been highlighted by Zeleny
(1982) for deciding the objective weights of attributes.
Entropy is a measure of uncertainty in the information
formulated using probability theory. It indicates that a
broad distribution represents more uncertainty than does
a sharply peaked one. To determine weights by the en-
tropy measure, the normalized decision matrix R
ij
, is con-
sidered [12].

r
]
=
x
]
_ x
]
2

]
]=1
, ] = 1, , [; i = 1, , n (1)

where J and n denote the number of alternatives and the
number of criteria, respectively. For alternative A
j
, the
performance measure of the ith criterion C
i
is represented
by x
ij
.
The amount of decision information contained in (1)
and associated with each attribute can be measured by
the entropy value e
j
as:

c
]
= -k R
]
N
=1
In R
]
(2)

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 65

Where k = 1/ln N is a constant that guarantees 0 e
j
1.
The degree of divergence (d
j
) of the average information
contained by each attribute can be calculated as:

d
j
= 1 - e
j
.

(3)

The more divergent the performance ratings R
ij
(for i =
1, 2, ..., N) for the attribute B
j
, the higher its corresponding
d
j
, and the more important the attribute B
j
for the deci-
sion-making problem under consideration (Zeleny, 1982).
The objective weight for each attribute B
j
(for j = 1, 2,
..., M) is thus given by:

w
]
=
J
]
J
k
M
k=1
(4)

TECHNIQUEFORORDERPREFERENCEBYSIMILARITYTO
IDEALSOLUTION(TOPSIS)

In multiple attribute decision making, each alternative is


evaluated based on a number of different criteria. One
popular solution method for a multiple criteria problem is
technique for order preference by similarity to ideal solu-
tion (TOPSIS) method.

General considerations

The following characteristics of the TOPSIS method
make it an appropriate approach which has good poten-
tial for solving our problem:

An unlimited range of association rules can be in-
cluded.
In the context of association rule selection, the ef-
fect of each attribute cannot be considered alone
and must always be seen as a trade-off with re-
spect to other attributes. In light of this, the TOP-
SIS model seems a suitable method for multi-
criteria association rule selection problems as it al-
lows explicit trade-offs and interactions among
attributes. More precisely, changes in one
attribute can be compensated for in a direct or
opposite manner by other attributes.
The output can be a preferential ranking of the al-
ternatives (candidate association rule) with a nu-
merical value that provides a better understand-
ing of differences and similarities between alter-
natives, whereas other MADM techniques (such
as the ELECTRE methods [1315]) only determine
the rank of each association rule.
Pair-wise comparisons, required by methods such
as the Analytical Hierarchy Process (AHP) [16,17],
are avoided. This is particularly useful when deal-
ing with a large number of alternatives and crite-
ria.

It can include a set of weighting coefcients for
different attributes.
It is relatively simple and fast, with a systematic
procedure.

Characteristics of the TOPSIS method

Hwang and Yoon [18] proposed the technique for or-
der preference by similarity to ideal solution (TOPSIS)
method to rank alternatives over multiple criteria. It finds
the best alternatives by minimizing the distance to the
idea solution and maximizing the distance to the nadir or
negative-ideal solution [19].

A number of extensions and variations of TOPSIS
have been developed over the years (e.g., [20,21]). The
following TOPSIS procedure adopted from [22] and [19]
is used in the empirical study:

Step 1: calculate the normalized decision matrix. The
normalized value r
ij
is calculated from (1).

Step 2: develop a set of weights w
i
for each criterion
and calculate the weighted normalized decision matrix.
The weighted normalized value v
ij
is calculated as

:
]
= w

x
]
, ] = 1, , [; i = 1, , n (S)

where w
i
is the weight of the ith criterion, and

w

= 1,
n
=1
(6)

Step 3: find the ideal alternative solution, which is
calculated as

S
+
= {:
1
+
, , :
n
+
]
= ](mox
]
:
]
|i e I) , [(min
]
:
]
|ie [) , (7)

where I' is associated with benefit criteria and I" is asso-
ciated with cost criteria.

Step 4: find the negative-ideal alternative solution,
which is calculated as

A
-
= {:
1
-
, , :
n
-
]
= ](min
]
:
]
|i e I) , [(mox
]
:
]
|ie [) , (8)

Step 5: calculate the separation measures, using the n-
dimensional Euclidean distance. The separation of each
alternative from the ideal solution is calculated as

+
= _(:
]
- :

+
)
2
n
=1
, ] = 1, , [. (9)

The separation of each alternative from the negative-
ideal solution is calculated as

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 66

-
= _(:
]
- :

-
)
2

n
=1
, ] = 1, , [. (1u)

Step 6: calculate a ratio R
j
+
that measures the relative
closeness to the ideal solution and is calculated as:

R

+
=

-
+

+
, ] = 1, , [. (11)

Step 7: rank alternatives by maximizing the ratio R
j
+
.

4. Proposed Method
In evaluation of association rules of data mining, the cri-
teria such as support, confidence, item set value and
cross-selling profit are to be considered as criteria in
MADM. The proposed method, which is based on a sim-
ple idea, is described as follows:

Input: association rules from each dataset.
Output: full ranking of association rules.

Step 1: identify association rules each dataset.

Step 2: weighting the criteria of association rules by en-
tropy method.

Step 3: multiple weighted gained from step 2 in each col-
umn of criteria.

Step 4: ranking the association rules by TOPSIS method.

TABLE I. Data of Association Rules.
Association
rule num-
ber
Criteria
Support
(%)
Confidence
(%)
Itemset
Value
Cross-
selling
profit
1 3.87 40.09 337.00 25.66
2 1.42 18.17 501.00 11.63
3 2.83 17.64 345.00 11.29
4 2.34 30.83 163.00 19.73
5 2.63 23.90 325.00 15.30
6 1.19 55.65 436.00 35.61
7 1.19 47.42 598.00 30.35
8 1.19 15.70 436.00 52.91
9 1.19 10.82 598.00 36.45
10 1.19 12.32 436.00 20.08
11 1.19 12.32 598.00 40.04
12 3.87 38.08 337.00 103.97
13 1.18 15.09 710.00 41.19
14 2.44 15.22 554.00 41.56
15 2.14 28.21 372.00 77.02
16 2.51 22.81 534.00 62.26
17 1.19 50.92 436.00 139.02
18 1.19 45.25 598.00 123.52
19 1.19 11.70 436.00 43.54
20 1.19 11.70 598.00 62.50
21 1.42 13.99 501.00 61.16
22 1.18 12.23 710.00 53.45
23 1.50 13.64 698.00 59.59
24 2.83 27.82 345.00 78.17
25 2.44 25.27 554.00 71.00
26 1.25 15.97 718.00 44.87
27 1.22 34.98 339.00 98.04
28 1.30 35.12 435.00 98.68
29 1.42 33.81 534.00 95.01
30 1.91 25.26 380.00 70.97
31 1.43 37.14 618.00 104.35
32 2.38 21.63 542.00 60.78
33 1.18 30.24 366.00 84.98
34 1.23 29.36 626.00 82.51
35 1.58 22.65 354.00 63.64
36 2.34 22.99 163.00 22.76
37 2.14 22.14 372.00 21.92
38 1.91 11.94 380.00 11.82
39 2.03 18.42 360.00 18.23
40 1.19 30.73 436.00 30.43
41 2.63 25.87 325.00 67.52
42 2.51 25.98 534.00 67.81
43 1.50 19.16 698.00 50.02
44 2.38 14.85 542.00 38.75
45 2.03 26.73 360.00 69.78
46 1.19 30.73 598.00 80.22

5. Experimental Study
To show applicability of proposed method, an exam-
ple of market basket data is adopted from Chen (2007).
Association rules first are discovered by the Apriori algo-
rithm, in which minimum support and minimum confi-
dence are set to 1.0% and 10.0%, respectively.

TABLE II. THE ENTROPY WEIGHTING OF EACH CRITERIA.
Entropy
Weighting
Criteria
Support
(%)
Confidence
(%)
Interest
Value
Cross-selling
profit
0.1958 0.2594 0.1220 0.4228


TABLE III. RANKING ASSOCIATION RULES BY TOPSIS.
Association
rule number
TOPSIS Method
D
+
D
-
R
+
Rank
1 0.1128 0.0592 0.3441 22
2 0.1391 0.0164 0.1055 44
3 0.1362 0.0271 0.1660 41
4 0.1244 0.0338 0.2138 36
5 0.1299 0.0305 0.1904 38
6 0.1083 0.0676 0.3844 19
7 0.1132 0.0566 0.3333 23
8 0.1084 0.0421 0.2796 29
2011 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 67

9 0.1241 0.0291 0.1898 39
10 0.1365 0.0133 0.0885 45
11 0.1202 0.0321 0.2107 37
12 0.0441 0.1056 0.7052 3
13 0.1174 0.0356 0.2329 33
14 0.1122 0.0381 0.2534 31
15 0.0768 0.0700 0.4771 12
16 0.0899 0.0574 0.3895 18
17 0.0416 0.1362 0.7658 1
18 0.0452 0.1199 0.7264 2
19 0.1184 0.0328 0.2170 35
20 0.1042 0.0521 0.3332 24
21 0.1023 0.0502 0.3291 26
22 0.1102 0.0455 0.2921 28
23 0.1029 0.0510 0.3315 25
24 0.0735 0.0735 0.4999 9
25 0.0814 0.0656 0.4463 13
26 0.1136 0.0390 0.2555 30
27 0.0644 0.0908 0.5850 6
28 0.0625 0.0917 0.5946 5
29 0.0642 0.0883 0.5792 7
30 0.0845 0.0627 0.4258 17
31 0.0559 0.0988 0.6388 4
32 0.0924 0.0551 0.3736 20
33 0.0759 0.0767 0.5027 8
34 0.0768 0.0756 0.4961 10
35 0.0937 0.0541 0.3660 21
36 0.1252 0.0266 0.1753 40
37 0.1260 0.0248 0.1647 42
38 0.1411 0.0135 0.0874 46
39 0.1315 0.0192 0.1273 43
40 0.1182 0.0349 0.2278 34
41 0.0841 0.0625 0.4264 16
42 0.0832 0.0634 0.4323 14
43 0.1061 0.0441 0.2935 27
44 0.1149 0.0353 0.2350 32
45 0.0839 0.0626 0.4271 15
46 0.0778 0.0740 0.4875 11

Then forty-six rules are identified and presented in table
I. Its aimed that association rules rank completely. These
association rules including four criteria including sup-
port, confidence, itemset value and cross selling profit.
The weight of each criterion calculated with entropy me-
thod and the results represented in table II. By using
TOPSIS method that introduced in previous section, for-
ty-six association rules ranked and the results summa-
rized in Table III.
6. Conclusion
Data mining popularity is growing at a lightning-fast
pace. Using these techniques, various rules may be ob-
tained and only a small number of these rules may be
selected for implementation due, at least in part, to limita-
tions of budget and resources. In this paper, we devel-
oped TOPSIS method which is able to identify ranking
alternative by considering four criteria including support,
confidence, itemset value and cross selling profit. This
method is applicable for ranking all association rules. In
comparison to previous works, our method is computa-
tionally efficient and also ranks all association rules. In
future we decided using other multi attribute decisions
makings methods to rank similar problems.

Refeiences

[1] J. Han, M. Kamber, Data Mining: Concepts and Techniques,
second ed., Morgan Kaufmann, San Fransisco, 2006. J. Clerk
Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2.
Oxford: Clarendon, 1892, pp.6873.
[2] C. Zhang, S. Zhang, Association Rule Mining: Models and
Algorithms Series, Lecture Notes in Computer Science, Lecture
Notes in Artificial Intelligence, Vol. 2307, Springer, Berlin, 2002.
[3] R. Agrawal, T. Imielinski, A. Swami, Mining association rules
between sets of items in large databases, in: SIGMOD, Washington,
DC, USA, 1993, pp. 207216. K. Elissa, Title of paper if known,
unpublished.
[4] R. Agrawal, R. Srikant, Fast algorithms for mining association
rules, in: International Conference on Very Large Data Bases,
Santiago de Chile, Chile, 1994, pp. 487499.
[5] R. Bellman, L.A. Zadeh, Decision making in a fuzzy environment,
Management Science 17B (4) (1970) 14116.
[6] C.L. Hwang, K. Yoon, Multiple Attribute Decision Making:
Methods and Applications, Springer-Verlag, Berlin, 1981.
[7] S.J. Chen, C.L. Hwang, Fuzzy Multiple Attribute Decision Making:
Methods and Applications, Springer-Verlag, Berlin, 1992.
[8] Z.S. Xu, Uncertain Multiple Attribute Decision Making: Methods
and Applications, Tsinghua University Press, Beijing, 2004.
[9] Y.M. Wang, C. Parkan, Multiple attribute decision making based on
fuzzy preference information on alternatives: ranking and weighting,
Fuzzy Sets and Systems 153 (2005) 331346.
[10] Z.B. Wu, Y.H. Chen, The maximizing deviation method for group
multiple attribute decision making under linguistic environment,
Fuzzy Sets and Systems 158 (2007) 16081617.
[11] Z.S. Xu, A method for multiple attribute decision making with
incomplete weight information in linguistic setting, Knowledge-
Based Systems 20 (2007) 719725.
[12] R.Venkata Rao, Decision Making in the Manufacturing
Enviornment. Springer Series in Advanced Manufacturing,
2006, pp. 34-35
[13] B. Roy, Multicriteria Methodology for Decision Aiding, volume 12
of NonconvexOptimization and its Applications, Kluwer Academic
Publishers,Dordrecht, 1996.
[14] B. Roy, The outranking approach and the foundations of ELECTRE
methods, Theory Decis. 31 (1991) 4973
[15] B. Roy, Aide multicrit`ere `a la decision: methodes et cas, Paris,
Economica(1993).
[16] T.L. Saaty, Decision making for leaders, the analytical hierarchy
process for decision in a complex world, Lifetime (1990).
[17] T.L. Saaty, Fundamentals of Decision Making and Priority Theory
with the Analytic Hierarchy Process, RWS Publications, University
of Pittsburgh,2000.
[18] Hwang CL, Yoon K. Multiple attribute decision making methods and
applications. Berlin Heidelberg: Springer; 1981.
[19] Olson DL. Comparison of weights in TOPSIS models.
Mathematical and Computer Modelling 2004b;40(78):7217.
[20] Chu TC. Facility location selection using fuzzy TOPSIS under group
decisions. International Journal of Uncertainty Fuzziness and
Knowledge-Based Systems 2002b;10(6):687701.
[21] Abo-Sinna MA, Amer AH. Extensions of TOPSIS for multi-
objective large-scale nonlinear programming problems. Applied
Mathematics and Computation 2005b;162:24356.
[22] Opricovic S, Tzeng GH. Compromise solution by MCDM methods:
a comparative analysis of VIKOR and TOPSIS. European Journal of
Operational Research 2004b;156(2):44555.


JOURNAL OF COMPUTING, VOLUME 3, ISSUE 11, NOVEMBER 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING
WWW.JOURNALOFCOMPUTING.ORG 68

S-ar putea să vă placă și