Documente Academic
Documente Profesional
Documente Cultură
Apriori
Apriori ICI
Apriori
I/O
Apriori
Abstract
Due to the improvement of information technologies and popularization of computers,
collecting information becomes easier, rapider and more convenient than before. As the time goes by,
database cumulates huge and hiding information. Therefore, how to correctly uncover and efficiently
mining from those hiding information becomes a very important issue. Hence the technology of data
mining becomes one of the solutions. In the technologies of data mining, association rules mining is
one of the most popular technology to be used. Association rule mining explores the approaches to
extract the frequent itemsets from large database. Further, derives the knowledge behind implicitly. The
Apriori algorithm is one of the most frequently used algorithms. Although the Apriori algorithm can
successful derive the association rules from database, the Apriori algorithm has two major defects: First,
the Apriori algorithm will produce large amounts of candidate itemsets during extracting the frequent
itemsets from large database. Second, frequently scanning whole database lead to inefficient
performance. Many researches try to improve the performance of the Apriori algorithm, but still not
escape from the frame of the Apriori algorithm and lead to a little improvement of the performance. In
this paper we propose ICI (Incremental Combination Itemsets) which escape the frame of Apriori
algorithm, and it only needs to scan whole database once during extracting the frequent itemsets from
large database. Therefore, the ICI algorithm can efficiently reduce the I/O time, and rapidly extract
during extracting the frequent itemsets from large database, and make data mining more efficient than
before.
Keywords Data MiningAssociation RuleAprioriFrequent itemsets
(Database) 2.1
(Item)
Agrawal [1]
(Data I={i1,i2,i3,,im}
Mining) (Itemset)D
T
T TI
ID
(1) TID
(Association Rules)[1,2,8] XY XIYI XY
80%
(2) (Time (Support)S (Confidence)CS
Sequence)[4,10]
(Threshold)
(3) (minsup)C
(Classification Rules)
(4)
(Clustering Rules)[6,9] ()
(Frequent itemsets)
(5) k
(Sequence Patter Analysis)[3] k-(k-itemsets) k-
k-()
BCE 3- BC
Apriori [1,2] EI BCE
DHP[5]AprioriTid[3]AprioriHybrid[3]
QDT[11,12] 2.2 Apriori
(k>1)
(Incremental Combination
ItemsetsICI ) Apriori (1) (k-1)-(Lk-1)
(Ck)
(2) D
(1) k-2
K (k-1)- k-
(Lk)
(3) (1)(2) k-
(k-1)-
k-
2.3 Apriori
(1) (Itemset) (1)
2- 1-
1- k (support)
(k-1)+(k-2)++1 2-
k *(k 1) Apriori
1-
2
1000 45 2- n
(2)
3.1 ICI
Step-1
Start MAP
MAP
Step-2
Step-3
N
MAP
MAP
Step-4
MAP
Step-5
Step-6
Step-7
Step-8
Step-9
End
1 ICI
3.2 T001
1 MAP MAP
2
MAP T001
T001 P001, P003, P004
3 N X=3 MAP
(itemsets) (P001, P003, P004)(P001, P003)(P001,
MAP P004)(P003, P004)(P001)(P003)(P004)
4 MAP
(itemsets)
5 3 (; T001)
X=1 A A
2 MAP X=2 AB AB, A, B
X=3 ABC ABC, AB, AC, BC, A, B, C
X=1 A A ABCD, ABC, ABD, ACD,
X=2 AB AB, A, B X=4 ABCD BCD, AB, AC, AD, BC, BD,
X=3 ABC ABC, AB, AC, BC, A, B, C CD, A, B, C, D
P002,P003P005<>
P002,P003,P005 2 P003,P005 2
2 MAP (A)
(X=2)
AB X=1 AB
3AB
B A 3.4.2
AB B P120,P220,P500(
B X=1 AB ( )
) X=3 MAP P120A
P220BP500C MAP
A A X=1
(P120,P220,P500)(P120,P220)
(P120, P500)(P220,P500)(P120)(P220)
AB A B AB X=2 (P500)( 6~8 )
+ ()
P120,P220,P500ABC
3 MAP (AAB) P120 P220 P500
(X=3)
(MAP)
ABC X=2
ABC 4ABC X=3 MAP
A B C
C
AB ABC C 6 ABC
C X=2 AC
BCABC P120,P220,P500AB,AC,BC
+
(MAP)
X=3 MAP
A B A C B C
ABC A B AB C AC BC ABC
X=3
+ + +
7 AB,AC,BC
A A X=1
(MAP)
AB A B AB X=2
+ X=3 MAP
A B C
8 A,B,C
ABC A B AB C AC BC ABC
X=3 3.5
+ + +
(1)
5 MAP
CPUPentium1.7GHz
Memory512 Mbytes
10
Name [L] [T] [N] [I] [D] ()
L10T7N500I4D10K 10 7 500 4 10K
(2)
L10T7N500I4D10K Apriori
9 11 Apriori
ICI QDT Apriori
3000
2630
2500
2000
1882
ICI ICI
1500
1000 940
500 534
305
0
22
9
22
9
22
9
22
9
22
9
I/O
1% 0.75% 0.50% 0.25% 0.10%
9 ICIQDTApriori
[1] Agrawal, R. Imielinski, T. and Swami, A.
(1993), Mining Association Rules Between
Sets of Items in Large Databases, In proc. of